Presentation is loading. Please wait.

Presentation is loading. Please wait.

Condor and the Grid D. Thain, T. Tannenbaum, M. Livny Christopher M. Moretti 23 February 2007.

Similar presentations


Presentation on theme: "Condor and the Grid D. Thain, T. Tannenbaum, M. Livny Christopher M. Moretti 23 February 2007."— Presentation transcript:

1 Condor and the Grid D. Thain, T. Tannenbaum, M. Livny Christopher M. Moretti 23 February 2007

2 2 Problem & Opportunity  Users need CPUs Scientific computing Mathematical modeling Data mining  Many CPU cycles are unused Personal workstations General use laboratories Research machines

3 3 Solution: Condor  “A hunter of idle workstations” Keeps track of resources  needed and available Determines and assigns matches Monitors progress Cleans up and reports results

4 4 Architecture  Three principals: Agent: machine needing resources Matchmaker Resource: machine lending resources  Three phases: Advertising Matching/Claiming Deploying/Executing

5 5 Advertising MatchMaker AgentLender I need X I have Y idle.cse.nd.eduneedy.cse.nd.edu Does Y satisfy X?

6 6 Matching & Claiming MatchMaker AgentLender Use idle.cse.nd.edu Listen for needy.cse.nd.edu idle.cse.nd.eduneedy.cse.nd.edu Are you still available? Yes.

7 7 Deploying / Executing AgentLender idle.cse.nd.eduneedy.cse.nd.edu Shadow Fork! Run job J. J I need file /tmp/foo. Sandbox Split Execution

8 8 Matching  How are matches determined? Policy ClassAds  Why independently claim a match?  What if the Matchmaker dies?

9 9 ClassAds  MyType=“Job”  TargetType=“Machine”  Requirements= (( other.Arch==“INTEL” && other.OpSys==“LINUX” && KeyboardIdle>600 ))  Cmd=“/tmp/a.out”  Owner=“cmoretti”  MyType=“Machine”  TargetType=“Job”  Machine= “dustpuppy.cse.nd.edu”  Requirements= (( KeyboardIdle>600 ))  Arch=“INTEL”  OpSys=“LINUX”

10 10 Flocking  Using another pool’s resources Utilize more total resources Find resources that match needs  Two methods Gateway flocking Direct flocking

11 11 Gateway Flocking  Each pool has a known “gateway”  Gateways negotiate sharing Advertise resources and needs Transmit requests to local matchmaker  Pool-level granularity Accounting Policy  Now obsolete

12 12 Gateway Flocking Gateway MM R R R R R R R A 2 3 5 4 5 1

13 13 Direct Flocking  Agents report to other matchmakers No gateways Equivalent to being in multiple pools?  Now the preferred (only) method

14 14 Gateway Flocking MM R R R R R R R A 2 3 1

15 15 Flocking Comparison + Transparency + Fosters organization-level sharing - Poor accounting - Complicated + No gateways + Individual relationships supported - Non-transparent - Fewer organization-level agreements Gateway FlockingDirect Flocking

16 16 Things Aren’t Perfect  What happens if (when) … Matchmaker goes down Network or Agent fails during deploy Resource or App fails during compute  Non-dedicated machines. How do we keep owners happy? What happens when an owner reclaims a resource?

17 17 Total Consumption in 2006 CPU-Hours Harnessed by Condor(48%)1161176 CPU-Hours Totally Unused(39%)934277 CPU-Hours Consumed by Owner at Keyboard(11%)281003 CPU-Hours Total(100%)2376456 http://www.cse.nd.edu/~ccl/operations/condor/2005/users.html Condor at Notre Dame Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

18 18 Current Donors Feb 2007 OwnerNodesCPUsStorage (TB) CRC/OIT92 3.7 CSE7312411.7 Prof. Thain59915.5 Prof. Flynn18350.65 Prof. Striegel 10200.65 Misc717 Total25937920.2 TB Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

19 19 CPU History Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

20 20 Recap  Condor facilitates distributed computation on dedicated or scavenged CPUs arranged by a matchmaker using ClassAds.  Split Execution is necessary to fit the job’s needs to the environment.  An agent can advertise to multiple matchmakers to examine more potential matches.


Download ppt "Condor and the Grid D. Thain, T. Tannenbaum, M. Livny Christopher M. Moretti 23 February 2007."

Similar presentations


Ads by Google