Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alain Roy Computer Sciences Department University of Wisconsin-Madison 23-June-2002 Introduction to Condor.

Similar presentations


Presentation on theme: "Alain Roy Computer Sciences Department University of Wisconsin-Madison 23-June-2002 Introduction to Condor."— Presentation transcript:

1 Alain Roy Computer Sciences Department University of Wisconsin-Madison roy@cs.wisc.edu http://www.cs.wisc.edu/condor 23-June-2002 Introduction to Condor

2 www.cs.wisc.edu/condor Доброе утро! › Thank you for having me! › I am:  Alain Roy  Computer Science Ph.D. in Quality of Service, with Globus Project  Working with the Condor Project

3 www.cs.wisc.edu/condor Condor Tutorials › Today (Sunday) 10:00-12:30  A general introduction to Condor › Monday 17:00-19:00  Using and administering Condor › Tuesday17:00-19:00  Using Condor on the Grid

4 www.cs.wisc.edu/condor A General Introduction to Condor

5 www.cs.wisc.edu/condor The Condor Project (Established 1985) Distributed Computing research performed by a team of about 30 faculty, full time staff, and students who:  face software engineering challenges in a Unix and Windows environment,  are involved in national and international collaborations,  actively interact with users,  maintain and support a distributed production environment,  and educate and train students.

6 www.cs.wisc.edu/condor A Multifaceted Project › Harnessing clusters—opportunistic and dedicated (Condor) › Job management for Grid applications (Condor-G, DaPSched) › Fabric management for Grid resources (Condor, GlideIns, NeST) › Distributed I/O technology (PFS, Kangaroo, NeST) › Job-flow management (DAGMan, Condor) › Distributed monitoring and management (HawkEye) › Technology for Distributed Systems (ClassAD, MW)

7 www.cs.wisc.edu/condor Harnessing Computers › We have more than 300 pools with more than 8500 CPUs worldwide. › We have more than 1800 CPUs in 10 pools on our campus. › Established a “complete” production environment for the UW CMS group › Adopted by the “real world” (Galileo, Maxtor, Micron, Oracle, Tigr, … )

8 www.cs.wisc.edu/condor The G rid … › Close collaboration and coordination with the Globus Project—joint development, adoption of common protocols, technology exchange, … › Partner in major national Grid R&D 2 (Research, Development and Deployment) efforts (GriPhyN, iVDGL, IPG, TeraGrid) › Close collaboration with Grid projects in Europe (EDG, GridLab, e-Science)

9 www.cs.wisc.edu/condor User/Application Fabric ( processing, storage, communication ) Grid

10 www.cs.wisc.edu/condor User/Application Fabric ( processing, storage, communication ) Grid Condor Globus Toolkit Condor

11 www.cs.wisc.edu/condor distributed I/O … › Close collaboration with the Scientific Data Management Group at LBL. › Provide management services for distributed data storage resources › Provide management and scheduling services for Data Placement jobs (DaPs) › Effective, secure and flexible remote I/O capabilities › Exception handling

12 www.cs.wisc.edu/condor job flow management … › Adoption of Directed Acyclic Graphs (DAGs) as a common job flow abstraction. › Adoption of the DAGMan as an effective solution to job flow management.

13 www.cs.wisc.edu/condor For the Rest of Today › Condor › Condor and the Grid › Related Technologies  DAGMan  ClassAds  Master-Worker  NeST  DaP Scheduler  Hawkeye › Today: Just the “Big Picture”

14 www.cs.wisc.edu/condor What is Condor? › Condor converts collections of distributively owned workstations and dedicated clusters into a distributed high-throughput computing facility.  Run lots of jobs over a long period of time,  Not a short burst of “high-performance” › Condor manages both machines and jobs with ClassAd Matchmaking to keep everyone happy

15 www.cs.wisc.edu/condor Condor Takes Care of You › Condor does whatever it takes to run your jobs, even if some machines…  Crash (or are disconnected)  Run out of disk space  Don’t have your software installed  Are frequently needed by others  Are far away & managed by someone else

16 www.cs.wisc.edu/condor What is Unique about Condor? › ClassAds › Transparent checkpoint/restart › Remote system calls › Works in heterogeneous clusters › Clusters can be:  Dedicated  Opportunistic

17 www.cs.wisc.edu/condor What’s Condor Good For? › Managing a large number of jobs  You specify the jobs in a file and submit them to Condor, which runs them all and sends you email when they complete  Mechanisms to help you manage huge numbers of jobs (1000’s), all the data, etc.  Condor can handle inter-job dependencies (DAGMan)

18 www.cs.wisc.edu/condor What’s Condor Good For? (cont’d) › Robustness  Checkpointing allows guaranteed forward progress of your jobs, even jobs that run for weeks before completion  If an execute machine crashes, you only lose work done since the last checkpoint  Condor maintains a persistent job queue - if the submit machine crashes, Condor will recover  (Story)

19 www.cs.wisc.edu/condor What’s Condor Good For? (cont’d) › Giving your job the agility to access more computing resources  Checkpointing allows your job to run on “opportunistic resources” (not dedicated)  Checkpointing also provides “migration” - if a machine is no longer available, move!  With remote system calls, run on systems which do not share a filesystem - You don’t even need an account on a machine where your job executes

20 www.cs.wisc.edu/condor Other Condor features › Implement your policy on when the jobs can run on your workstation › Implement your policy on the execution order of the jobs › Keep a log of your job activities

21 www.cs.wisc.edu/condor A Condor Pool In Action

22 www.cs.wisc.edu/condor A Bit of Condor Philosophy › Condor brings more computing to everyone  A small-time scientist can make an opportunistic pool with 10 machines, and get 10 times as much computing done.  A large collaboration can use Condor to control it’s dedicated pool with hundreds of machines.

23 www.cs.wisc.edu/condor The Condor Idea Computing power is everywhere, we try to make it usable by anyone.

24 www.cs.wisc.edu/condor Meet Frieda. She is a scientist. But she has a big problem.

25 www.cs.wisc.edu/condor Frieda’s Application … Simulate the behavior of F(x,y,z) for 20 values of x, 10 values of y and 3 values of z (20*10*3 = 600 combinations)  F takes on the average 3 hours to compute on a “typical” workstation ( total = 1800 hours )  F requires a “moderate” (128MB) amount of memory  F performs “moderate” I/O - (x,y,z) is 5 MB and F(x,y,z) is 50 MB

26 www.cs.wisc.edu/condor I have 600 simulations to run. Where can I get help?

27 www.cs.wisc.edu/condor Install a Personal Condor!

28 www.cs.wisc.edu/condor Installing Condor › Download Condor for your operating system › Available as a free download from http://www.cs.wisc.edu/condor › Not labelled as “Personal” Condor, just “Condor”. › Available for most Unix platforms and Windows NT

29 www.cs.wisc.edu/condor So Frieda Installs Personal Condor on her machine… › What do we mean by a “Personal” Condor?  Condor on your own workstation, no root access required, no system administrator intervention needed—easy to set up. › So after installation, Frieda submits her jobs to her Personal Condor…

30 www.cs.wisc.edu/condor Personal Condor?! What’s the benefit of a Condor “Pool” with just one user and one machine?

31 www.cs.wisc.edu/condor Your Personal Condor will... › Keep an eye on your jobs and will keep you posted on their progress › Keep a log of your job activities › Add fault tolerance to your jobs › Implement your policy on when the jobs can run on your workstation

32 www.cs.wisc.edu/condor Frieda is happy until… She realizes she needs to run a post-analysis on each job, after it completes.

33 www.cs.wisc.edu/condor Condor DAGMan › Directed Acyclic Graph Manager › DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you. › (e.g., “Don’t run job “B” until job “A” has completed successfully.”)

34 www.cs.wisc.edu/condor What is a DAG? › A DAG is the data structure used by DAGMan to represent these dependencies. › Each job is a “node” in the DAG. › Each node can have any number of “parent” or “children” nodes – as long as there are no loops! Job A Job BJob C Job D

35 www.cs.wisc.edu/condor DAGMan Running a DAG › DAGMan acts as a “meta-scheduler”, managing the submission of your jobs to Condor based on the DAG dependencies. Condor Job Queue C D A A B.dag File

36 www.cs.wisc.edu/condor DAGMan Running a DAG (cont’d) › DAGMan holds & submits jobs to Condor at the appropriate times. Condor Job Queue C D B C B A

37 www.cs.wisc.edu/condor DAGMan Running a DAG (cont’d) › In case of a job failure, DAGMan continues until it can no longer make progress, and then creates a “rescue” file with the current state of the DAG. Condor Job Queue X D A B Rescue File

38 www.cs.wisc.edu/condor DAGMan Recovering a DAG › Once the failed job is ready to be re-run, the rescue file can be used to restore the prior state of the DAG. Condor Job Queue C D A B Rescue File C

39 www.cs.wisc.edu/condor DAGMan Recovering a DAG (cont’d) › Once that job completes, DAGMan will continue the DAG as if the failure never happened. Condor Job Queue C D A B D

40 www.cs.wisc.edu/condor DAGMan Finishing a DAG › Once the DAG is complete, the DAGMan job itself is finished, and exits. Condor Job Queue C D A B

41 www.cs.wisc.edu/condor Frieda wants more… › She decides to use the graduate students’ computers when they aren’t, and get done sooner. › In exchange, they can use the Condor pool too.

42 www.cs.wisc.edu/condor Frieda’s Condor pool… Frieda’s Computer: Central Manager Graduate Student’s Desktop Computers

43 www.cs.wisc.edu/condor Frieda’s Pool is Flexible › Since Frieda’s is a professor, her jobs are preferred. › Frieda doesn’t always have jobs, so now the graduate students have access to more computing power. › Frieda’s pool has enabled more work to be done by everyone.

44 www.cs.wisc.edu/condor How does this work? › Frieda submits a job. Condor makes a ClassAd and give it to the Central Manager:  Owner = “Frieda”  MemoryUsed = 40M  ImageSize=20M  Requirements=(Opsys==“Linux” && Memory > MemoryUsed) › Central Manager collects machine ClassAds:  Memory=128M  Requirements=(ImageSize < 50M)  Rank=(Owner==“Frieda”) › Central Manager finds best match

45 www.cs.wisc.edu/condor After a match is found › Central Manager tells both parties about the match › Frieda’s computer and the remote computer cooperate to run Frieda’s job.

46 www.cs.wisc.edu/condor Lots of flexibility › Machines can:  Only run jobs when I have been idle for at least 15 minutes—or always run them.  Kick off jobs when someone starts using the computer—or never kick them off. › Jobs can:  Require or prefer certain machines  Use checkpointing, remote I/O, etc…

47 www.cs.wisc.edu/condor Happy Day! Frieda’s organization purchased a Beowulf Cluster! › Other scientists in her department have realized the power of Condor and want to share it.. › The Beowulf cluster and the graduate student computers can be part of a single Condor pool.

48 www.cs.wisc.edu/condor Frieda’s Condor pool… Frieda’s Computer: Central Manager Graduate Student’s Desktop Computers Beowulf Cluster

49 www.cs.wisc.edu/condor Frieda’s Big Condor Pool › Jobs can prefer to run in the Beowulf cluster by using “Rank”. › Jobs can run just on “appropriate machines” based on:  Memory, disk space, software, etc. › The Beowulf cluster is dedicated. › The student computers are still useful. › Everyone’s computing power is increased.

50 www.cs.wisc.edu/condor Frieda collaborates… › She wants to share her Condor pool with scientists from another lab.

51 www.cs.wisc.edu/condor Condor Flocking › Condor pools can work cooperatively

52 www.cs.wisc.edu/condor Flocking… › Flocking is Condor specific—you can just link Condor pools together › Jobs usually prefer running in their “native” pool, before running in alternate pools. › What if you want to connect to a non- Condor pool?

53 www.cs.wisc.edu/condor Condor-G › Condor-G lets you submit jobs to Grid resources.  Uses Globus job submission mechanisms › You get Condor’s benefits:  Fault tolerance, monitoring, etc. › You get the Grid’s benefits:  Use any Grid resources

54 www.cs.wisc.edu/condor Condor as a Grid Resource › Condor can be a backend for Globus  Submit Globus jobs to Condor resource  The Globus jobs run in the Condor pool

55 www.cs.wisc.edu/condor Condor Summary › Condor is useful, even on a single machine or a small pool. › Condor can bring computing power to people that can’t afford a “real” cluster. › Condor can work with dedicated clusters › Condor works with the Grid › Questions so far?

56 www.cs.wisc.edu/condor ClassAds › Condor uses ClassAds internally to pair jobs with machines.  Normally, you don’t need to know the details when you use Condor  We saw sample ClassAds earlier. › If you like, you can also use ClassAds in your own projects.

57 www.cs.wisc.edu/condor What Are ClassAds? › A ClassAd maps attributes to expressions › Expressions  Constants: strings, numbers, etc.  Expressions: other.Memory > 600M  Lists: { “roy”, “pfc”, “melski” }  Other ClassAds › Powerful tool for grid computing  Semi-structured (you pick your structure)  Matchmaking

58 www.cs.wisc.edu/condor ClassAd Example [ Type = “Job”; Owner = “roy”; Universe = “Standard”; Requirements = (other.OpSys == “Linux” && other.DiskSpace > 140M); Rank = (other.DiskSpace > 300M ? 10 : 1); ClusterID = 12314; JobID = 0; Env = “”; … ] Real ClassAds have a more fields than will fit on this slide.

59 www.cs.wisc.edu/condor ClassAd Matchmaking [ Type = “Job”; Owner = “roy”; Requirements = (other.OpSys == “Linux” && other.DiskSpace > 140M); Rank = (other.DiskSpace > 300M ? 10 : 1); ] [ Type = “Machine”; OpSys = “Linux”; DiskSpace = 500M; AllowedUsers = {“roy”, “melski”, “pfc”}; Requirements = (IsMember(other.Owner, AllowedUsers); ]

60 www.cs.wisc.edu/condor ClassAds Are Open Source › Library GNU Public License (LGPL) › Complete source code included  Library code  Test program › Available from: http://www.cs.wisc.edu/condor/classad › Version 0.9.3

61 www.cs.wisc.edu/condor Who Uses ClassAds? › Condor › European Data Grid › NeST › Web site › …You?

62 www.cs.wisc.edu/condor ClassAd User: Condor › ClassAds describe jobs and machines › Matchmaking figures out what jobs run on which machines › DAGMan will soon internally represent DAGs as ClassAds

63 www.cs.wisc.edu/condor ClassAd User: EU Datagrid › JDL: ClassAd schema to describe jobs/machines › ResourceBroker: matches jobs to machines

64 www.cs.wisc.edu/condor ClassAd User: NeST › NeST is a storage appliance › NeST uses ClassAd collections for persistent storage of:  User Information  File meta-data  Disk Information  Lots (storage space allocations)

65 www.cs.wisc.edu/condor ClassAd User: Web Site › Web-based application in Germany › User actions (transitions) are constrained › Constraints expressed through ClassAds

66 www.cs.wisc.edu/condor ClassAd Summary › ClassAds are flexible › Matchmaking is powerful › You can use ClassAd independently of Condor: http://www.cs.wisc.edu/condor/classad/

67 www.cs.wisc.edu/condor MW = Master-Worker › Master-Worker Style Parallel Applications  Large problem partitioned into small pieces (tasks);  The master manages tasks and resources (worker pool);  Each worker gets a task, execute it, sends the result back, and repeat until all tasks are done;  Examples: ray-tracing, optimization problems, etc. › On Condor (PVM, Globus, … … )  Many opportunities!  Issues (in a Distributed Opportunistic Environment): Resource management, communication, portability; Fault-tolerance, dealing with runtime pool changes.

68 www.cs.wisc.edu/condor MW to Simplify the Work! › An OO framework with simple interfaces  3 classes to extend, a few virtual functions to fill;  Scientists can focus on their algorithms. › Lots of Functionality  Handles all the issues in a meta-computing environment;  Provides sufficient info. to make smart decisions. › Many Choices without Changing User Code  Multiple resource managers: Condor, PVM, …  Multiple communication interfaces: PVM, File, Socket, …

69 www.cs.wisc.edu/condor Application classes Underlying infrastructure MW’s Layered Architecture Resource Mgr MW abstract classes Communication Layer API IPI Infrastructure Provider’s Interface MWMW MW App.

70 www.cs.wisc.edu/condor MW’s Runtime Structure 1.User code adds tasks to the master’s Todo list; 2.Each task is sent to a worker (Todo -> Running); 3.The task is executed by the worker; 4.The result is sent back to the master; 5.User code processes the result (can add/remove tasks). Worker Process Worker Process Worker Process …… Master Process ToDo tasks Running tasks Workers

71 www.cs.wisc.edu/condor MW Summary › It’s simple:  simple API, minimal user code. › It’s powerful:  works on meta-computing platforms. › It’s inexpensive:  On top of Condor, it can exploits 100s of machines. › It solves hard problems!  Nug30, STORM, … …

72 www.cs.wisc.edu/condor MW Success Stories › Nug30 solved in 7 days by MW-QAP  Quadratic assignment problem outstanding for 30 years  Utilized 2500 machines from 10 sites NCSA, ANL, UWisc, Gatech, INFN@Italy, … … 1009 workers at peak, 11 CPU years  http://www-unix.mcs.anl.gov/metaneos/nug30/ › STORM (flight scheduling)  Stochastic programming problem ( 1000M row X 13000M col)  2K times larger than the best sequential program can do  556 workers at peak, 1 CPU year  http://www.cs.wisc.edu/~swright/stochastic/atr/

73 www.cs.wisc.edu/condor MW Information › http://www.cs.wisc.edu/condor/mw/

74 www.cs.wisc.edu/condor Questions So Far?

75 www.cs.wisc.edu/condor NeST › Traditional file servers have not evolved  NeST is a 2nd gen file server › Flexible storage appliance for the grid  Provides local and remote access to data  Easy management of storage resources › User level sw turns machines into storage apps  Deployable and portable

76 www.cs.wisc.edu/condor Research Meets Production › NeST exists at an exciting intersection › Freedom to pursue academic curiosities › Opportunities to discover real user concerns

77 www.cs.wisc.edu/condor Very exciting intersection

78 www.cs.wisc.edu/condor NeST Supports Lots › A lot is a guaranteed storage allocation. › When you run your large analysis on a Grid, will you have sufficient storage for your results? › Lots ensure you have storage space.

79 www.cs.wisc.edu/condor NeST Supports Multiple Protocols › Interoperability between admin domains › NeST currently speaks  Grid FTP and FTP  HTTP  NFS (beta)  Chirp › Designed for integration of new protocols

80 www.cs.wisc.edu/condor Dispatcher Transfer Mgr Concurrencies Storage Mgr Control flow Datal flow ChirpFTPGrid ftpNFS Common protocol layer HTTP Physical network layer Physical storage layer Design structure

81 www.cs.wisc.edu/condor Why not JBOS? › Just a bunch of servers has limitations › NeST advantages over JBOS:  Single config and admin interface  Optimizations across multiple protocols e.g. cache aware scheduling  Management and control of protocols e.g. prefer local users to remote users

82 www.cs.wisc.edu/condor Three-Way Matching Machine NeST Job Ad Machine Ad Storage Ad match Refers to NearestStorage. Knows where NearestStorage is.

83 www.cs.wisc.edu/condor Three way ClassAds Type = “job” TargetType = “machine” Cmd = “sim.exe” Owner = “thain” Requirements = (OpSys==“linux”) && NearestStorage.HasCMSData Job ClassAd Type = “machine” TargetType = “job” OpSys = “linux” Requirements = (Owner==“thain”) NearestStorage = ( Name = “turkey”) && (Type==“Storage”) Machine ClassAd Type = “storage” Name = “turkey.cs.wisc.edu” HasCMSData = true CMSDataPath = /cmsdata” Storage ClassAd

84 www.cs.wisc.edu/condor NeST Information › http://www.cs.wisc.edu/condor/nest  Version 0.9 now available (linux only, no NFS)  Solaris and NFS coming soon  Requests welcome

85 www.cs.wisc.edu/condor DaP Scheduler › Intelligent scheduling of data transfers

86 www.cs.wisc.edu/condor Applications Demand Storage › Database systems › Multimedia applications › Scientific applications  High Energy Physics & Computational Genomics  Currently terabytes, soon petabytes of data

87 www.cs.wisc.edu/condor Is Remote access good enough? › Huge amounts of data (mostly in tapes) › Large number of users › Distance / Low Bandwidth › Different platforms › Scalability and efficiency concerns => A middleware is required

88 www.cs.wisc.edu/condor Two approaches › Move job/application to the data  Less common  Insufficient computational power on storage site  Not efficient  Does not scale › Move data to the job/application

89 www.cs.wisc.edu/condor Move data to the Job Huge tape library (terabytes) Compute cluster LAN Local Storage Area (eg. Local Disk, NeST Server..) WAN Remote Staging Area

90 www.cs.wisc.edu/condor Main Issues › 1. Insufficient local storage area › 2. CPU should not wait much for I/O › 3. Crash Recovery › 4. Different Platforms & Protocols › 5. Make it simple

91 www.cs.wisc.edu/condor Data Placement Scheduler (DaPS) › Intelligently Manages and Schedules Data Placement (DaP) activities/jobs › What Condor is for computational jobs, DaPS means the same for DaP jobs › Just submit a bunch of DaP jobs and then relax..

92 www.cs.wisc.edu/condor Supported Protocols › Currently supported:  FTP  GridFTP  NeST (chirp)  SRB (Storage Resource Broker) › Very soon:  SRM (Storage Resource Manager)  GDMP (Grid Data Management Pilot)

93 www.cs.wisc.edu/condor Case Study: DAGMan.dag File Condor Job Queue A DAGMan C D A B

94 www.cs.wisc.edu/condor Current DAG structure › All jobs are assumed to be computational jobs Job A Job B Job C Job D

95 www.cs.wisc.edu/condor Current DAG structure › If data transfer to/from remote sites is required, this is performed via pre- and post-scripts attached to each job. Job A PRE Job B POST Job C Job D

96 www.cs.wisc.edu/condor New DAG structure Add DaP jobs to the DAG structure PRE Job B POST Transfer in Reserve In & out Job B Transfer out Release in Release out

97 www.cs.wisc.edu/condor New DAGMan Architecture.dag File Condor Job Queue A DAGMan B D A C DaPS Job Queue X Y X

98 www.cs.wisc.edu/condor DaP Conclusion › More intelligent management of remote data transfer & staging  increase local storage utilization  maximize CPU throughput

99 www.cs.wisc.edu/condor Questions So Far?

100 www.cs.wisc.edu/condor Hawkeye › Sys admins first need information about what is happening on the machines they are responsible for.  Both current and past  Information must be consolidated and easily accessible  Information must be dynamic

101 www.cs.wisc.edu/condor HawkEye Monitoring Agent HawkEye Manager HawkEye Monitoring Agent

102 www.cs.wisc.edu/condor HawkEye Monitoring Agent /proc, kstat… Hawkeye_Startup_Agent Hawkeye_Monitor HawkEye Monitoring Agent HawkEye Manager ClassAd Updates

103 www.cs.wisc.edu/condor Monitor Agent, cont. › Updates are sent periodically  Information does not get stale › Updates also serve as a heartbeat monitor  Know when a machine is down › Out of the box, the update ClassAd has many attributes about the machine of interest for system administration  Current Prototype = about 200 attributes

104 www.cs.wisc.edu/condor Custom Attributes /proc, kstat… Hawkeye_Startup_Agent Hawkeye_Monitor HawkEye Monitoring Agent HawkEye Manager Data from hawkeye_update_attribute command line tool Create your own HawkEye plugins, or share plugins with others

105 www.cs.wisc.edu/condor Role of HawkEye Manager › Store all incoming ClassAds in a indexed resident data structure  Fast response to client tool queries about current state  “Show me all machines with a load average > 10” › Periodically store ClassAd attributes into a Round Robin Database  Store information over time  “Show me a graph with the load average for this machine over the past week” › Speak to clients via CEDAR, HTTP

106 Web client › Command-line, GUI, Web-based http://www.cs.wisc.edu/~roy/hawkeye/

107 www.cs.wisc.edu/condor Running tasks on behalf of the sys admin › Submit your sys admin tasks to HawkEye  Tasks are stored in a persistent queue by the Manager  Tasks can leave the queue upon completion, or repeat after specified intervals  Tasks can have complex interdependencies via DAGMan  Records are kept on which task ran where › Sounds like Condor, eh?  Yes, but simpler…

108 www.cs.wisc.edu/condor Run Tasks in response to monitoring information › ClassAd “Requirements” Attribute › Example: Send email if a machine is low on disk space or low on swap space  Submit an email task with an attribute: Requirements = free_disk < 5 || free_swap < 5 › Example w/ task interdependency: If load average is high and OS=Linux and console is Idle, submit a task which runs “top”, if top sees Netscape, submit a task to kill Netscape

109 www.cs.wisc.edu/condor Today’s Summary › Condor works on many levels  Small pools can make a big difference  Big pools are for the really big problems  Condor works in the Grid › Condor is assisted by a host of technologies:  ClassAds, Checkpointing, Remote I/O DAGMan, Master-Worker, NeST, DaPScheduler, Hawkeye

110 www.cs.wisc.edu/condor Questions? Comments? › Web: www.cs.wisc.edu/condor › Email: condor-admin@cs.wisc.edu


Download ppt "Alain Roy Computer Sciences Department University of Wisconsin-Madison 23-June-2002 Introduction to Condor."

Similar presentations


Ads by Google