Condor and the Grid D. Thain, T. Tannenbaum, M. Livny Christopher M. Moretti 23 February 2007.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Research Issues in Cooperative Computing Douglas Thain
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
DCC/FCUP Grid Computing 1 Resource Management Systems.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.
An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame
6d.1 Schedulers and Resource Brokers ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson.
The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain.
High Throughput Computing with Condor at Notre Dame Douglas Thain 30 April 2009.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Grid Computing: Technology and Sociology at Large Scales Douglas Thain University of Notre Dame 5 November 2004.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Cheap cycles from the desktop to the dedicated cluster: combining opportunistic and dedicated scheduling with Condor Derek Wright Computer Sciences Department.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.
6d.1 Schedulers and Resource Brokers ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Grid Computing 7700 Fall 2005 Lecture 17: Resource Management Gabrielle Allen
Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
History of the National INFN Pool P. Mazzanti, F. Semeria INFN – Bologna (Italy) European Condor Week 2006 Milan, 29-Jun-2006.
Networked Storage Technologies Douglas Thain University of Wisconsin GriPhyN NSF Project Review January 2003 Chicago.
USTH Presentation Power-aware Scheduler for Virtualization TRAN Giang Son Prof. Daniel HAGIMONT Oct 19th, 2011.
1 1 Vulnerability Assessment of Grid Software Jim Kupsch Associate Researcher, Dept. of Computer Sciences University of Wisconsin-Madison Condor Week 2006.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Hunter of Idle Workstations Miron Livny Marvin Solomon University of Wisconsin-Madison URL:
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
1 Bogotá, EELA-2 1 st Conference, The OurGrid Approach for Opportunistic Grid Computing Francisco Brasileiro Universidade Federal.
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Review of Condor,SGE,LSF,PBS
Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.
Condor week – March 2005©Gabriel Kliot, Technion1 Adding High Availability to Condor Central Manager Gabi Kliot Technion – Israel Institute of Technology.
Scheduling & Resource Management in Distributed Systems Rajesh Rajamani, May 2001.
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Managing Network Resources in Condor Jim Basney Computer Sciences Department University of Wisconsin-Madison
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at.
1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.
6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12, 2007 Local schedulers Condor.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
Fundamental Operations Scalability and Speedup
Condor A New PACI Partner Opportunity Miron Livny
Quick Architecture Overview INFN HTCondor Workshop Oct 2016
Condor – A Hunter of Idle Workstation
Grid Computing.
Condor: Job Management
Haiyan Meng and Douglas Thain
Basic Grid Projects – Condor (Part I)
Condor: Firewall Mirroring
GLOW A Campus Grid within OSG
Presentation transcript:

Condor and the Grid D. Thain, T. Tannenbaum, M. Livny Christopher M. Moretti 23 February 2007

2 Problem & Opportunity  Users need CPUs Scientific computing Mathematical modeling Data mining  Many CPU cycles are unused Personal workstations General use laboratories Research machines

3 Solution: Condor  “A hunter of idle workstations” Keeps track of resources  needed and available Determines and assigns matches Monitors progress Cleans up and reports results

4 Architecture  Three principals: Agent: machine needing resources Matchmaker Resource: machine lending resources  Three phases: Advertising Matching/Claiming Deploying/Executing

5 Advertising MatchMaker AgentLender I need X I have Y idle.cse.nd.eduneedy.cse.nd.edu Does Y satisfy X?

6 Matching & Claiming MatchMaker AgentLender Use idle.cse.nd.edu Listen for needy.cse.nd.edu idle.cse.nd.eduneedy.cse.nd.edu Are you still available? Yes.

7 Deploying / Executing AgentLender idle.cse.nd.eduneedy.cse.nd.edu Shadow Fork! Run job J. J I need file /tmp/foo. Sandbox Split Execution

8 Matching  How are matches determined? Policy ClassAds  Why independently claim a match?  What if the Matchmaker dies?

9 ClassAds  MyType=“Job”  TargetType=“Machine”  Requirements= (( other.Arch==“INTEL” && other.OpSys==“LINUX” && KeyboardIdle>600 ))  Cmd=“/tmp/a.out”  Owner=“cmoretti”  MyType=“Machine”  TargetType=“Job”  Machine= “dustpuppy.cse.nd.edu”  Requirements= (( KeyboardIdle>600 ))  Arch=“INTEL”  OpSys=“LINUX”

10 Flocking  Using another pool’s resources Utilize more total resources Find resources that match needs  Two methods Gateway flocking Direct flocking

11 Gateway Flocking  Each pool has a known “gateway”  Gateways negotiate sharing Advertise resources and needs Transmit requests to local matchmaker  Pool-level granularity Accounting Policy  Now obsolete

12 Gateway Flocking Gateway MM R R R R R R R A

13 Direct Flocking  Agents report to other matchmakers No gateways Equivalent to being in multiple pools?  Now the preferred (only) method

14 Gateway Flocking MM R R R R R R R A 2 3 1

15 Flocking Comparison + Transparency + Fosters organization-level sharing - Poor accounting - Complicated + No gateways + Individual relationships supported - Non-transparent - Fewer organization-level agreements Gateway FlockingDirect Flocking

16 Things Aren’t Perfect  What happens if (when) … Matchmaker goes down Network or Agent fails during deploy Resource or App fails during compute  Non-dedicated machines. How do we keep owners happy? What happens when an owner reclaims a resource?

17 Total Consumption in 2006 CPU-Hours Harnessed by Condor(48%) CPU-Hours Totally Unused(39%) CPU-Hours Consumed by Owner at Keyboard(11%) CPU-Hours Total(100%) Condor at Notre Dame Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

18 Current Donors Feb 2007 OwnerNodesCPUsStorage (TB) CRC/OIT CSE Prof. Thain Prof. Flynn Prof. Striegel Misc717 Total TB Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

19 CPU History Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

20 Recap  Condor facilitates distributed computation on dedicated or scavenged CPUs arranged by a matchmaker using ClassAds.  Split Execution is necessary to fit the job’s needs to the environment.  An agent can advertise to multiple matchmakers to examine more potential matches.