Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
Ian D. Alderman Computer Sciences Department University of Wisconsin-Madison Condor Week 2007 Signed.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor-G: A Case in Distributed.
Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.
CONDOR CISC 879 Parallel Computation Spring 2003 Preethi Natarajan.
Douglas Thain Computer Sciences Department University of Wisconsin-Madison (In Bologna for June 2000) Remote.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
ENFORCING PERFORMANCE ISOLATION ACROSS VIRTUAL MACHINES IN XEN Diwaker Gupta, Ludmila Cherkasova, Rob Gardner, Amin Vahdat Middleware '06 Proceedings of.
1 CompuP2P: An Architecture for Sharing of Computing Resources In Peer-to-Peer Networks With Selfish Nodes Rohit Gupta and Arun K. Somani
Douglas Thain Computer Sciences Department University of Wisconsin-Madison October Condor by Example.
The Condor Data Access Framework GridFTP / NeST Day 31 July 2001 Douglas Thain.
Condor Project Computer Sciences Department University of Wisconsin-Madison Asynchronous Notification in Condor By Vidhya Murali.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Grid Computing, B. Wilkinson, 20046d.1 Schedulers and Resource Brokers.
Alain Roy Computer Sciences Department University of Wisconsin-Madison An Introduction To Condor International.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Installing and Managing a Large Condor Pool Derek Wright Computer Sciences Department University of Wisconsin-Madison
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Peter Keller Computer Sciences Department University of Wisconsin-Madison Quill Tutorial Condor Week.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Hunter of Idle Workstations Miron Livny Marvin Solomon University of Wisconsin-Madison URL:
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
Flexibility, Manageability and Performance in a Grid Storage Appliance John Bent, Venkateshwaran Venkataramani, Nick Leroy, Alain Roy, Joseph Stanley,
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
Dan Bradley University of Wisconsin-Madison Condor and DISUN Teams Condor Administrator’s How-to.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
HIGUCHI Takeo Department of Physics, Faulty of Science, University of Tokyo Representing dBASF Development Team BELLE/CHEP20001 Distributed BELLE Analysis.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
Ian D. Alderman Computer Sciences Department University of Wisconsin-Madison Condor Week 2008 End-to-end.
MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison
Int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating.
Condor on WAN D. Bortolotti - INFN Bologna T. Ferrari - INFN Cnaf A.Ghiselli - INFN Cnaf P.Mazzanti - INFN Bologna F. Prelz - INFN Milano F.Semeria - INFN.
Nicholas Coleman Computer Sciences Department University of Wisconsin-Madison Distributed Policy Management.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Managing Network Resources in Condor Jim Basney Computer Sciences Department University of Wisconsin-Madison
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
Condor Tutorial NCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor and Virtual Machines.
Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
OpenPBS – Distributed Workload Management System
Condor – A Hunter of Idle Workstation
Condor and Multi-core Scheduling
Basic Grid Projects – Condor (Part I)
Initial job submission and monitoring efforts with JClarens
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
GLOW A Campus Grid within OSG
Presentation transcript:

Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in Condor

Outline › Introduction: Improving Goodput › Research Overview: Making Network a Condor-Managed Resource › Improving Goodput in Condor 6.2 › Conclusion

› Goodput = Allocation - Network Overhead Placement Remote I/O Periodic Checkpoint Preemption Checkpoint X

Improving Goodput › Overlap network I/O with computation when possible › Complete synchronous network I/O operations as quickly as possible  Make network capacity an allocated resource

Outline › Introduction: Improving Goodput › Research Overview: Making Network a Condor-Managed Resource › Improving Goodput in Condor 6.2 › Conclusion

Matchmaking Framework: Advertisement Network Manager Customer Agent Compute Server Matchmaker Resource Requests Resource Offers Resource Offers

Matchmaking Framework: Match Notification Network Manager Customer Agent Compute Server Matchmaker

Admission Control in the Matchmaker › Only schedule jobs for which network capacity is available  Transfer executable, input, and checkpoint files  Transfer files for preempted job

Admission Control in the Matchmaker (cont.) › Some capacity reserved for system goodput  Schedule jobs with small network requirements on CPUs that would otherwise go idle because of limited network capacity

Matchmaking Framework: Claiming Network Manager Customer Agent Compute Server Matchmaker

Network Manger › Accepts claims for network resources  Schedules placement & preemption xfers  Allocates supplemental requests  Supports advance reservations › Incorporates feedback into future allocation decisions

Network Scheduling › Start time, end time, min. rate, max. rate › Search forward or backward › First fit, best rate, earliest completion / latest start › Example: scheduling checkpoints before shutdown event

Bandwidth Control › Streams register with network manager and send bandwidth requests › Network manager allocates available bandwidth according to max-min fairness

Outline › Introduction: Improving Goodput › Research Overview: Making Network a Condor-Managed Resource › Improving Goodput in Condor 6.2 › Conclusion

Job Goodput Statistics: condor_q › % condor_q -goodput -- Submitter: corduroy.ncsa.uiuc.edu : ID OWNER SUBMITTED RUN_TIME GOODPUT CPU_UTIL Mb/s jbasney 3/2 12: :07: % 90.7% jbasney 3/2 12: :48: % 90.5% jbasney 3/2 12: :09: % 91.9% jbasney 3/2 12: :24: % 94.2% 0.15 › % condor_q -io -- Submitter: corduroy.ncsa.uiuc.edu : ID OWNER READ WRITE SEEK XPUT BUFSIZE BLKSIZE jbasney 96.0 B KB B /s KB 32.0 KB jbasney 96.0 B 91.4 KB B /s KB 32.0 KB jbasney 96.0 B KB B /s KB 32.0 KB jbasney 96.0 B KB B /s KB 32.0 KB

Job Goodput Statistics: condor_userlog › % condor_userlog -j tot Host/Job Wall Time Good Time CPU Usage Avg Alloc Avg Lost Goodput Util : : : : : % 98.3% : : : : : % 90.1% : : : : :15 0.0% 0.0% : : : : : % 93.4% : : : : :40 0.0% 0.0% : : : : : % 92.5% : : : : : % 93.6% : : : : : % 92.8% : : : : : % 88.6% : : : : : % 95.2% Total 2+16: : : : : % 95.2%

Multiple Checkpoint Servers › Checkpoint faster by writing to a local checkpoint server › CKPT_SERVER_HOST defines the checkpoint server for each machine › Jobs send checkpoints to the server configured at the execution site › Works with flocking

Checkpoint Server Domains › Condor sets LastCkptServer attribute in the Job ClassAd › Job ClassAds use LastCkptServer to request a machine close to their checkpoint

Conclusion › Network overheads reduce the efficiency of CPU allocations › Overlap I/O with computation › Allocate network resources in Condor › Improving goodput in Condor 6.2  condor_q & condor_userlog  Multiple checkpoint servers