Alain Roy Computer Sciences Department University of Wisconsin-Madison 23-June-2002 Introduction to Condor
Доброе утро! › Thank you for having me! › I am: Alain Roy Computer Science Ph.D. in Quality of Service, with Globus Project Working with the Condor Project
Condor Tutorials › Today (Sunday) 10:00-12:30 A general introduction to Condor › Monday 17:00-19:00 Using and administering Condor › Tuesday17:00-19:00 Using Condor on the Grid
A General Introduction to Condor
The Condor Project (Established 1985) Distributed Computing research performed by a team of about 30 faculty, full time staff, and students who: face software engineering challenges in a Unix and Windows environment, are involved in national and international collaborations, actively interact with users, maintain and support a distributed production environment, and educate and train students.
A Multifaceted Project › Harnessing clusters—opportunistic and dedicated (Condor) › Job management for Grid applications (Condor-G, DaPSched) › Fabric management for Grid resources (Condor, GlideIns, NeST) › Distributed I/O technology (PFS, Kangaroo, NeST) › Job-flow management (DAGMan, Condor) › Distributed monitoring and management (HawkEye) › Technology for Distributed Systems (ClassAD, MW)
Harnessing Computers › We have more than 300 pools with more than 8500 CPUs worldwide. › We have more than 1800 CPUs in 10 pools on our campus. › Established a “complete” production environment for the UW CMS group › Adopted by the “real world” (Galileo, Maxtor, Micron, Oracle, Tigr, … )
The G rid … › Close collaboration and coordination with the Globus Project—joint development, adoption of common protocols, technology exchange, … › Partner in major national Grid R&D 2 (Research, Development and Deployment) efforts (GriPhyN, iVDGL, IPG, TeraGrid) › Close collaboration with Grid projects in Europe (EDG, GridLab, e-Science)
User/Application Fabric ( processing, storage, communication ) Grid
User/Application Fabric ( processing, storage, communication ) Grid Condor Globus Toolkit Condor
distributed I/O … › Close collaboration with the Scientific Data Management Group at LBL. › Provide management services for distributed data storage resources › Provide management and scheduling services for Data Placement jobs (DaPs) › Effective, secure and flexible remote I/O capabilities › Exception handling
job flow management … › Adoption of Directed Acyclic Graphs (DAGs) as a common job flow abstraction. › Adoption of the DAGMan as an effective solution to job flow management.
For the Rest of Today › Condor › Condor and the Grid › Related Technologies DAGMan ClassAds Master-Worker NeST DaP Scheduler Hawkeye › Today: Just the “Big Picture”
What is Condor? › Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing facility. Run lots of jobs over a long period of time, Not a short burst of “high-performance” › Condor manages both machines and jobs with ClassAd Matchmaking to keep everyone happy
Condor Takes Care of You › Condor does whatever it takes to run your jobs, even if some machines… Crash (or are disconnected) Run out of disk space Don’t have your software installed Are frequently needed by others Are far away & managed by someone else
What is Unique about Condor? › ClassAds › Transparent checkpoint/restart › Remote system calls › Works in heterogeneous clusters › Clusters can be: Dedicated Opportunistic
What’s Condor Good For? › Managing a large number of jobs You specify the jobs in a file and submit them to Condor, which runs them all and sends you when they complete Mechanisms to help you manage huge numbers of jobs (1000’s), all the data, etc. Condor can handle inter-job dependencies (DAGMan)
What’s Condor Good For? (cont’d) › Robustness Checkpointing allows guaranteed forward progress of your jobs, even jobs that run for weeks before completion If an execute machine crashes, you only lose work done since the last checkpoint Condor maintains a persistent job queue - if the submit machine crashes, Condor will recover (Story)
What’s Condor Good For? (cont’d) › Giving your job the agility to access more computing resources Checkpointing allows your job to run on “opportunistic resources” (not dedicated) Checkpointing also provides “migration” - if a machine is no longer available, move! With remote system calls, run on systems which do not share a filesystem - You don’t even need an account on a machine where your job executes
Other Condor features › Implement your policy on when the jobs can run on your workstation › Implement your policy on the execution order of the jobs › Keep a log of your job activities
A Condor Pool In Action
A Bit of Condor Philosophy › Condor brings more computing to everyone A small-time scientist can make an opportunistic pool with 10 machines, and get 10 times as much computing done. A large collaboration can use Condor to control it’s dedicated pool with hundreds of machines.
The Condor Idea Computing power is everywhere, we try to make it usable by anyone.
Meet Frieda. She is a scientist. But she has a big problem.
Frieda’s Application … Simulate the behavior of F(x,y,z) for 20 values of x, 10 values of y and 3 values of z (20*10*3 = 600 combinations) F takes on the average 3 hours to compute on a “typical” workstation ( total = 1800 hours ) F requires a “moderate” (128MB) amount of memory F performs “moderate” I/O - (x,y,z) is 5 MB and F(x,y,z) is 50 MB
I have 600 simulations to run. Where can I get help?
Install a Personal Condor!
Installing Condor › Download Condor for your operating system › Available as a free download from › Not labelled as “Personal” Condor, just “Condor”. › Available for most Unix platforms and Windows NT
So Frieda Installs Personal Condor on her machine… › What do we mean by a “Personal” Condor? Condor on your own workstation, no root access required, no system administrator intervention needed—easy to set up. › So after installation, Frieda submits her jobs to her Personal Condor…
Personal Condor?! What’s the benefit of a Condor “Pool” with just one user and one machine?
Your Personal Condor will... › Keep an eye on your jobs and will keep you posted on their progress › Keep a log of your job activities › Add fault tolerance to your jobs › Implement your policy on when the jobs can run on your workstation
Frieda is happy until… She realizes she needs to run a post-analysis on each job, after it completes.
Condor DAGMan › Directed Acyclic Graph Manager › DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you. › (e.g., “Don’t run job “B” until job “A” has completed successfully.”)
What is a DAG? › A DAG is the data structure used by DAGMan to represent these dependencies. › Each job is a “node” in the DAG. › Each node can have any number of “parent” or “children” nodes – as long as there are no loops! Job A Job BJob C Job D
DAGMan Running a DAG › DAGMan acts as a “meta-scheduler”, managing the submission of your jobs to Condor based on the DAG dependencies. Condor Job Queue C D A A B.dag File
DAGMan Running a DAG (cont’d) › DAGMan holds & submits jobs to Condor at the appropriate times. Condor Job Queue C D B C B A
DAGMan Running a DAG (cont’d) › In case of a job failure, DAGMan continues until it can no longer make progress, and then creates a “rescue” file with the current state of the DAG. Condor Job Queue X D A B Rescue File
DAGMan Recovering a DAG › Once the failed job is ready to be re-run, the rescue file can be used to restore the prior state of the DAG. Condor Job Queue C D A B Rescue File C
DAGMan Recovering a DAG (cont’d) › Once that job completes, DAGMan will continue the DAG as if the failure never happened. Condor Job Queue C D A B D
DAGMan Finishing a DAG › Once the DAG is complete, the DAGMan job itself is finished, and exits. Condor Job Queue C D A B
Frieda wants more… › She decides to use the graduate students’ computers when they aren’t, and get done sooner. › In exchange, they can use the Condor pool too.
Frieda’s Condor pool… Frieda’s Computer: Central Manager Graduate Student’s Desktop Computers
Frieda’s Pool is Flexible › Since Frieda’s is a professor, her jobs are preferred. › Frieda doesn’t always have jobs, so now the graduate students have access to more computing power. › Frieda’s pool has enabled more work to be done by everyone.
How does this work? › Frieda submits a job. Condor makes a ClassAd and give it to the Central Manager: Owner = “Frieda” MemoryUsed = 40M ImageSize=20M Requirements=(Opsys==“Linux” && Memory > MemoryUsed) › Central Manager collects machine ClassAds: Memory=128M Requirements=(ImageSize < 50M) Rank=(Owner==“Frieda”) › Central Manager finds best match
After a match is found › Central Manager tells both parties about the match › Frieda’s computer and the remote computer cooperate to run Frieda’s job.
Lots of flexibility › Machines can: Only run jobs when I have been idle for at least 15 minutes—or always run them. Kick off jobs when someone starts using the computer—or never kick them off. › Jobs can: Require or prefer certain machines Use checkpointing, remote I/O, etc…
Happy Day! Frieda’s organization purchased a Beowulf Cluster! › Other scientists in her department have realized the power of Condor and want to share it.. › The Beowulf cluster and the graduate student computers can be part of a single Condor pool.
Frieda’s Condor pool… Frieda’s Computer: Central Manager Graduate Student’s Desktop Computers Beowulf Cluster
Frieda’s Big Condor Pool › Jobs can prefer to run in the Beowulf cluster by using “Rank”. › Jobs can run just on “appropriate machines” based on: Memory, disk space, software, etc. › The Beowulf cluster is dedicated. › The student computers are still useful. › Everyone’s computing power is increased.
Frieda collaborates… › She wants to share her Condor pool with scientists from another lab.
Condor Flocking › Condor pools can work cooperatively
Flocking… › Flocking is Condor specific—you can just link Condor pools together › Jobs usually prefer running in their “native” pool, before running in alternate pools. › What if you want to connect to a non- Condor pool?
Condor-G › Condor-G lets you submit jobs to Grid resources. Uses Globus job submission mechanisms › You get Condor’s benefits: Fault tolerance, monitoring, etc. › You get the Grid’s benefits: Use any Grid resources
Condor as a Grid Resource › Condor can be a backend for Globus Submit Globus jobs to Condor resource The Globus jobs run in the Condor pool
Condor Summary › Condor is useful, even on a single machine or a small pool. › Condor can bring computing power to people that can’t afford a “real” cluster. › Condor can work with dedicated clusters › Condor works with the Grid › Questions so far?
ClassAds › Condor uses ClassAds internally to pair jobs with machines. Normally, you don’t need to know the details when you use Condor We saw sample ClassAds earlier. › If you like, you can also use ClassAds in your own projects.
What Are ClassAds? › A ClassAd maps attributes to expressions › Expressions Constants: strings, numbers, etc. Expressions: other.Memory > 600M Lists: { “roy”, “pfc”, “melski” } Other ClassAds › Powerful tool for grid computing Semi-structured (you pick your structure) Matchmaking
ClassAd Example [ Type = “Job”; Owner = “roy”; Universe = “Standard”; Requirements = (other.OpSys == “Linux” && other.DiskSpace > 140M); Rank = (other.DiskSpace > 300M ? 10 : 1); ClusterID = 12314; JobID = 0; Env = “”; … ] Real ClassAds have a more fields than will fit on this slide.
ClassAd Matchmaking [ Type = “Job”; Owner = “roy”; Requirements = (other.OpSys == “Linux” && other.DiskSpace > 140M); Rank = (other.DiskSpace > 300M ? 10 : 1); ] [ Type = “Machine”; OpSys = “Linux”; DiskSpace = 500M; AllowedUsers = {“roy”, “melski”, “pfc”}; Requirements = (IsMember(other.Owner, AllowedUsers); ]
ClassAds Are Open Source › Library GNU Public License (LGPL) › Complete source code included Library code Test program › Available from: › Version 0.9.3
Who Uses ClassAds? › Condor › European Data Grid › NeST › Web site › …You?
ClassAd User: Condor › ClassAds describe jobs and machines › Matchmaking figures out what jobs run on which machines › DAGMan will soon internally represent DAGs as ClassAds
ClassAd User: EU Datagrid › JDL: ClassAd schema to describe jobs/machines › ResourceBroker: matches jobs to machines
ClassAd User: NeST › NeST is a storage appliance › NeST uses ClassAd collections for persistent storage of: User Information File meta-data Disk Information Lots (storage space allocations)
ClassAd User: Web Site › Web-based application in Germany › User actions (transitions) are constrained › Constraints expressed through ClassAds
ClassAd Summary › ClassAds are flexible › Matchmaking is powerful › You can use ClassAd independently of Condor:
MW = Master-Worker › Master-Worker Style Parallel Applications Large problem partitioned into small pieces (tasks); The master manages tasks and resources (worker pool); Each worker gets a task, execute it, sends the result back, and repeat until all tasks are done; Examples: ray-tracing, optimization problems, etc. › On Condor (PVM, Globus, … … ) Many opportunities! Issues (in a Distributed Opportunistic Environment): Resource management, communication, portability; Fault-tolerance, dealing with runtime pool changes.
MW to Simplify the Work! › An OO framework with simple interfaces 3 classes to extend, a few virtual functions to fill; Scientists can focus on their algorithms. › Lots of Functionality Handles all the issues in a meta-computing environment; Provides sufficient info. to make smart decisions. › Many Choices without Changing User Code Multiple resource managers: Condor, PVM, … Multiple communication interfaces: PVM, File, Socket, …
Application classes Underlying infrastructure MW’s Layered Architecture Resource Mgr MW abstract classes Communication Layer API IPI Infrastructure Provider’s Interface MWMW MW App.
MW’s Runtime Structure 1.User code adds tasks to the master’s Todo list; 2.Each task is sent to a worker (Todo -> Running); 3.The task is executed by the worker; 4.The result is sent back to the master; 5.User code processes the result (can add/remove tasks). Worker Process Worker Process Worker Process …… Master Process ToDo tasks Running tasks Workers
MW Summary › It’s simple: simple API, minimal user code. › It’s powerful: works on meta-computing platforms. › It’s inexpensive: On top of Condor, it can exploits 100s of machines. › It solves hard problems! Nug30, STORM, … …
MW Success Stories › Nug30 solved in 7 days by MW-QAP Quadratic assignment problem outstanding for 30 years Utilized 2500 machines from 10 sites NCSA, ANL, UWisc, Gatech, … … 1009 workers at peak, 11 CPU years › STORM (flight scheduling) Stochastic programming problem ( 1000M row X 13000M col) 2K times larger than the best sequential program can do 556 workers at peak, 1 CPU year
MW Information ›
Questions So Far?
NeST › Traditional file servers have not evolved NeST is a 2nd gen file server › Flexible storage appliance for the grid Provides local and remote access to data Easy management of storage resources › User level sw turns machines into storage apps Deployable and portable
Research Meets Production › NeST exists at an exciting intersection › Freedom to pursue academic curiosities › Opportunities to discover real user concerns
Very exciting intersection
NeST Supports Lots › A lot is a guaranteed storage allocation. › When you run your large analysis on a Grid, will you have sufficient storage for your results? › Lots ensure you have storage space.
NeST Supports Multiple Protocols › Interoperability between admin domains › NeST currently speaks Grid FTP and FTP HTTP NFS (beta) Chirp › Designed for integration of new protocols
Dispatcher Transfer Mgr Concurrencies Storage Mgr Control flow Datal flow ChirpFTPGrid ftpNFS Common protocol layer HTTP Physical network layer Physical storage layer Design structure
Why not JBOS? › Just a bunch of servers has limitations › NeST advantages over JBOS: Single config and admin interface Optimizations across multiple protocols e.g. cache aware scheduling Management and control of protocols e.g. prefer local users to remote users
Three-Way Matching Machine NeST Job Ad Machine Ad Storage Ad match Refers to NearestStorage. Knows where NearestStorage is.
Three way ClassAds Type = “job” TargetType = “machine” Cmd = “sim.exe” Owner = “thain” Requirements = (OpSys==“linux”) && NearestStorage.HasCMSData Job ClassAd Type = “machine” TargetType = “job” OpSys = “linux” Requirements = (Owner==“thain”) NearestStorage = ( Name = “turkey”) && (Type==“Storage”) Machine ClassAd Type = “storage” Name = “turkey.cs.wisc.edu” HasCMSData = true CMSDataPath = /cmsdata” Storage ClassAd
NeST Information › Version 0.9 now available (linux only, no NFS) Solaris and NFS coming soon Requests welcome
DaP Scheduler › Intelligent scheduling of data transfers
Applications Demand Storage › Database systems › Multimedia applications › Scientific applications High Energy Physics & Computational Genomics Currently terabytes, soon petabytes of data
Is Remote access good enough? › Huge amounts of data (mostly in tapes) › Large number of users › Distance / Low Bandwidth › Different platforms › Scalability and efficiency concerns => A middleware is required
Two approaches › Move job/application to the data Less common Insufficient computational power on storage site Not efficient Does not scale › Move data to the job/application
Move data to the Job Huge tape library (terabytes) Compute cluster LAN Local Storage Area (eg. Local Disk, NeST Server..) WAN Remote Staging Area
Main Issues › 1. Insufficient local storage area › 2. CPU should not wait much for I/O › 3. Crash Recovery › 4. Different Platforms & Protocols › 5. Make it simple
Data Placement Scheduler (DaPS) › Intelligently Manages and Schedules Data Placement (DaP) activities/jobs › What Condor is for computational jobs, DaPS means the same for DaP jobs › Just submit a bunch of DaP jobs and then relax..
Supported Protocols › Currently supported: FTP GridFTP NeST (chirp) SRB (Storage Resource Broker) › Very soon: SRM (Storage Resource Manager) GDMP (Grid Data Management Pilot)
Case Study: DAGMan.dag File Condor Job Queue A DAGMan C D A B
Current DAG structure › All jobs are assumed to be computational jobs Job A Job B Job C Job D
Current DAG structure › If data transfer to/from remote sites is required, this is performed via pre- and post-scripts attached to each job. Job A PRE Job B POST Job C Job D
New DAG structure Add DaP jobs to the DAG structure PRE Job B POST Transfer in Reserve In & out Job B Transfer out Release in Release out
New DAGMan Architecture.dag File Condor Job Queue A DAGMan B D A C DaPS Job Queue X Y X
DaP Conclusion › More intelligent management of remote data transfer & staging increase local storage utilization maximize CPU throughput
Questions So Far?
Hawkeye › Sys admins first need information about what is happening on the machines they are responsible for. Both current and past Information must be consolidated and easily accessible Information must be dynamic
HawkEye Monitoring Agent HawkEye Manager HawkEye Monitoring Agent
HawkEye Monitoring Agent /proc, kstat… Hawkeye_Startup_Agent Hawkeye_Monitor HawkEye Monitoring Agent HawkEye Manager ClassAd Updates
Monitor Agent, cont. › Updates are sent periodically Information does not get stale › Updates also serve as a heartbeat monitor Know when a machine is down › Out of the box, the update ClassAd has many attributes about the machine of interest for system administration Current Prototype = about 200 attributes
Custom Attributes /proc, kstat… Hawkeye_Startup_Agent Hawkeye_Monitor HawkEye Monitoring Agent HawkEye Manager Data from hawkeye_update_attribute command line tool Create your own HawkEye plugins, or share plugins with others
Role of HawkEye Manager › Store all incoming ClassAds in a indexed resident data structure Fast response to client tool queries about current state “Show me all machines with a load average > 10” › Periodically store ClassAd attributes into a Round Robin Database Store information over time “Show me a graph with the load average for this machine over the past week” › Speak to clients via CEDAR, HTTP
Web client › Command-line, GUI, Web-based
Running tasks on behalf of the sys admin › Submit your sys admin tasks to HawkEye Tasks are stored in a persistent queue by the Manager Tasks can leave the queue upon completion, or repeat after specified intervals Tasks can have complex interdependencies via DAGMan Records are kept on which task ran where › Sounds like Condor, eh? Yes, but simpler…
Run Tasks in response to monitoring information › ClassAd “Requirements” Attribute › Example: Send if a machine is low on disk space or low on swap space Submit an task with an attribute: Requirements = free_disk < 5 || free_swap < 5 › Example w/ task interdependency: If load average is high and OS=Linux and console is Idle, submit a task which runs “top”, if top sees Netscape, submit a task to kill Netscape
Today’s Summary › Condor works on many levels Small pools can make a big difference Big pools are for the really big problems Condor works in the Grid › Condor is assisted by a host of technologies: ClassAds, Checkpointing, Remote I/O DAGMan, Master-Worker, NeST, DaPScheduler, Hawkeye
Questions? Comments? › Web: ›