Condor – A Hunter of Idle Workstation

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
SLA-Oriented Resource Provisioning for Cloud Computing
Week 6: Chapter 6 Agenda Automation of SQL Server tasks using: SQL Server Agent Scheduling Scripting Technologies.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
Jim Basney Computer Sciences Department University of Wisconsin-Madison Managing Network Resources in.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Authors: Weiwei Chen, Ewa Deelman 9th International Conference on Parallel Processing and Applied Mathmatics 1.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Harnessing the Capacity of Computational.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Department of Computer Science Engineering SRM University
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-3 CPU Scheduling Department of Computer Science and Software Engineering.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Ian Alderman A Little History…
Grid Computing I CONDOR.
Scheduling. Alternating Sequence of CPU And I/O Bursts.
The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
VIPIN VIJAYAN 11/11/03 A Performance Analysis of Two Distributed Computing Abstractions.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 15 Scheduling Read Ch.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
A Method for Transparent Admission Control and Request Scheduling in E-Commerce Web Sites S. Elnikety, E. Nahum, J. Tracey and W. Zwaenpoel Presented By.
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.
Chapter 1 (PART 1) Introduction to OS (concept, evolution, some keywords) Department of Computer Science Southern Illinois University Edwardsville Summer,
Automatic Statistical Evaluation of Resources for Condor Daniel Nurmi, John Brevik, Rich Wolski University of California, Santa Barbara.
FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen Department of Computer Sciences University of Wisconsin-Madison Jeff Linderoth, Argonne.
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
MW: A framework to support Master Worker Applications Sanjeev R. Kulkarni Computer Sciences Department University of Wisconsin-Madison
FATCOP: A Mixed Integer Program Solver Michael FerrisQun Chen University of Wisconsin-Madison Jeffrey Linderoth Argonne National Laboratories.
1 Checkpoint and Migration in the Condor Distributed Processing System Presentation by James Nugent CS739 University of Wisconsin Madison.
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Managing Network Resources in Condor Jim Basney Computer Sciences Department University of Wisconsin-Madison
Douglas Thain, John Bent Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Miron Livny Computer Sciences Department, UW-Madison Gathering at the Well: Creating.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
Scheduling a 100,000 Core Supercomputer for Maximum Utilization and Capability September 2010 Phil Andrews Patricia Kovatch Victor Hazlewood Troy Baer.
Introduction to operating systems What is an operating system? An operating system is a program that, from a programmer’s perspective, adds a variety of.
Condor A New PACI Partner Opportunity Miron Livny
Operating Systems (CS 340 D)
CompSci 725 Presentation by Siu Cho Jun, William.
Condor: Job Management
Introduction to Cloud Computing
Operating Systems (CS 340 D)
CPU Scheduling G.Anuradha
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Lecture 21: Introduction to Process Scheduling
Basic Grid Projects – Condor (Part I)
مهارات التدريس الفعال.
Chapter 5: CPU Scheduling
The Condor JobRouter.
Lecture 21: Introduction to Process Scheduling
Introduction to OS (concept, evolution, some keywords)
Module 5: CPU Scheduling
Introduction to OS (concept, evolution, some keywords)
Process/Code Migration and Cloning
Module 5: CPU Scheduling
Presentation transcript:

Condor – A Hunter of Idle Workstation Michael J. Litzkow, Miron Livny, and Matt W. Mutka Department of Computer Sciences University of Wisconsin Reporter:S.Y.C.

Outline Abstract Introduction System Design Performance Conclusions 2007-11-21 S.Y.C.

Abstract Condor operates in a workstation environment Aims to maximize the utilization of workstations with as little interference as possible between the jobs it schedules and the activities of the people who own workstations 2007-11-21 S.Y.C.

Abstract (Cont.) Identifies idle workstations and schedules background jobs The System guarantees that the job will eventually complete, and that very little. Condor has proven to be an extremely effective means to improve the productivity of our computing environment 2007-11-21 S.Y.C.

Introduction Workstations are powerful machines capable of executing millions of instruction each second The processing demands of the owner are much smaller than the capacity of the workstation owns 2007-11-21 S.Y.C.

Introduction (Cont.) User would like to take advantage of and available capacity they and access that can support their needs. Can we provide a high quality of service in a highly utilized network of workstations? 2007-11-21 S.Y.C.

System Design The placement of background jobs should be transparent to user If a remote site running a background job fails, the job should be restarted automatically at some other location to guarantee job completion. 2007-11-21 S.Y.C.

System Design (Cont.) Since a workstation can serve as a source of remote cycles for others when it is not used by its owner The mechanisms implementing the system are expected to consume very little capacity 2007-11-21 S.Y.C.

System Design (Cont.) The Condor Scheduling Structure 2007-11-21 S.Y.C.

System Design (Cont.) The Remote Unix (RU) Facility Checkpointing Remote Unix turns idle workstations into cycle servers When someone resumes using a workstation that is executing a remote job, the job must be stopped Checkpointing When a job is removed from a remote location, RU checkpoints it. 2007-11-21 S.Y.C.

System Design (Cont.) The checkpointing of a program is the saving of the state of the program so that its execution can be restarted 2007-11-21 S.Y.C.

Performance Profile of user service requests 2007-11-21 S.Y.C.

Performance (Cont.) Queue Length and Average Wait Ratio 2007-11-21 S.Y.C.

Performance (Cont.) Utilization of Remote Resources 2007-11-21 S.Y.C.

Performance (Cont.) Utilization for one Week 2007-11-21 S.Y.C.

Performance (Cont.) Queue Lengths for One Week 2007-11-21 S.Y.C.

Conclusions Networks with workstations have increased in great numbers in recent years This paper discusses a system that effectively utilizes idle workstation capacity and presents a profile of its performance. About 75% of the time the workstations were available as sources of remote cycles 2007-11-21 S.Y.C.