MaGate Experiments on Scenarios GridGroup EIF, Feb 5th, 2009 Ye HUANG Pervasive Artificial Intelligence Group, Dept of Informatics, University of Fribourg,

Slides:



Advertisements
Similar presentations
Agreement-based Distributed Resource Management Alain Andrieux Karl Czajkowski.
Advertisements

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
Chapter 8 Operating System Support
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Common System Components
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Ch 4. The Evolution of Analytic Scalability
Operating systems CHAPTER 7.
Christopher Jeffers August 2012
Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”
Operating Systems.  Operating System Support Operating System Support  OS As User/Computer Interface OS As User/Computer Interface  OS As Resource.
1 Lecture 2 Introduction, OS History n objective of an operating system n OS history u no OS u batch system u multiprogramming u multitasking.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Lecture 5 Operating Systems.
Operating Systems.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Part one. overview  Operating system is the software that controls the overall operation of a computer.  It provide the interface by which a user can.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 3: Operating Systems Computer Science: An Overview Tenth Edition.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
SmartGRID Ongoing research work in Univ. Fribourg and Univ. Applied Sciences of Western Switzerland (HES-SO) SwiNG Grid Day, Bern, Nov. 26th, 2009 Ye HUANG.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Operating System Principles And Multitasking
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Silberschatz, Galvin and Gagne  Operating System Concepts UNIT II Operating System Services.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
INFSO-RI Enabling Grids for E-sciencE Policy management and fair share in gLite Andrea Guarise HPDC 2006 Paris June 19th, 2006.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Chapter 7 Operating Systems Foundations of Computer Science  Cengage Learning 1.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.
SmartGRID Decentralized, dynamic grid scheduling framework on swarm agent-based intelligence GCC'08, shenzhen, China. Oct. 26, 2008 Ye HUANG, Amos BROCCO.
SmartGRID Decentralized, dynamic grid scheduling framework on swarm agent-based intelligence Seminar in HUST, Wuhan, China. Oct. 22, 2008 Ye HUANG, Amos.
An operating system for a large-scale computer that is used by many people at once is a very complex system. It contains many millions of lines of instructions.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
OPERATING SYSTEMS CS 3502 Fall 2017
Resource Management IB Computer Science.
William Stallings Computer Organization and Architecture
TYPES OFF OPERATING SYSTEM
Ch 4. The Evolution of Analytic Scalability
Operating Systems.
A Simulator to Study Virtual Memory Manager Behavior
CPU SCHEDULING.
Chapter 2: Operating-System Structures
Overview of Workflows: Why Use Them?
Chapter 2: Operating-System Structures
Presentation transcript:

MaGate Experiments on Scenarios GridGroup EIF, Feb 5th, 2009 Ye HUANG Pervasive Artificial Intelligence Group, Dept of Informatics, University of Fribourg, Switzerland Grid Group, Dept of Information and Communication Technologies, EIA-FR, Switzerland

1 Outline  MaGate Modules  Experiment arguments  Experiment scenarios  To-do and to-think

2 MaGate Modules (1)  Match Maker: scheduling decision joint with local RMS  Currently, no difference between local jobs and received community jobs  Module Controller: message center and services invoker  MaGate Monitor: tracking the events/logs of MaGate for self- diagnostic and statistic purpose  Others: auxiliary for simulating resources, jobs, res. discovery

3 MaGate Modules (2)  SIM-I: submitting simulation jobs locally  Job simulation  job size (est MIPS * sec)  I/O size  number of requested CPU (PE: process element)  Arrival time  Expected finish time  Priority  Job generation from archived HPC workload trace file  Done and tested, but no used here for its costly time consumption

4 MaGate Modules (3)  SIM-I: executing simulated jobs locally  Resource simulation  Massive Parallel Processor System (MPP)  Number of PE (process element)  Local policy (space shared based FCFS)  Time zone and use cost  Calendar (Holiday policy)

5 MaGate Modules (4)  Output Requester  Currently: process failed local jobs to available neighbors  To extend: process in-queued local jobs? Coupled with local scheduling policy  Input Requester  Process input job delegation request, check local community policy, and give the answer  Output Responser  Process the finished community jobs, sends them to their original MaGate  Input Responser  Process the returned output jobs  Community Monitor  Check the CM status, and adjust community policy if necessary

6 MaGate Modules (5)  Data Storage  Currently: in memory class  To extend: database (persistent)  Res. Discovery  Currently: in memory  To extend: Amos? :-)  Res. Monitoring  Not done  Scheduling Policy  To refactoring the MatchMaker (Kernel Module)

7 Outline  MaGate Modules  Experiment arguments  Experiment scenarios  To-do and to-think

8 Experiment arguments (1)  Site Model  contribute their computational resources and share their jobs  Typically, each site has its own local RMS to submit jobs to local resources  Machine Model  MPPs (Massive Parallel Processor System)  PE stands for “Process Element”  Each PE is a single processing system with local memory and storage, uses space-sharing policy and runs jobs exclusively.  Different MPPs only differ in the number of Pes  Job Model  Currently only concerning the batch jobs which are dominant on most MPP systems  Each job is comprised of several parameters  Requested run time and number of PE by a job is assumed to be known. (Although it’s tough in real world)

9 Experiment arguments (2)  For each experiment iteration  10 simultaneous MaGate  1 resource(1 MPP) each MaGate  Total number of processor (PE), [radix/2, radix]. {e.g. if radix=128, then [64, 128]}  PE MIPS: around 1000  1000 job from each MaGate  Job length: around (estimatedMIPS * estimatedSec = 1000 * 2000)  Job arrival time: 0 ~ 43,200 sec. (12 heures)  Bad job ratio: 30%  Good job: number of requested processor: [1-5]  Bad job (though guys): number of requested processor: [radix/2, radix]. {e.g. if radix=128, then [64, 128]}

10 Outline  MaGate Modules  Experiment arguments  Experiment scenarios  To-do and to-think

11 Experiment scenarios  Beyond one assumption of Grid  Once scheduling decisions are made, the resource is supposed to execute the job successfully (if infrastructure is not failed)  Users know where (which grid/grid section) to submit their jobs

12 Experiment scenarios  [1] Job locally execution simple success  [2] Job locally execution simple mixed  [3] Failed local job simple output - blindly delegation  [4] Failed local job simple output - Larger PE preferred  [5] Failed local job output, input job-queue & input accomplishment ratio limited  [6] Failed local job output & Re-negotiation on input-limited policy  [7] Failed local job output constrained & Re-negotiation on input- limited policy  [8] In-queued waiting local job community delegation

13 Experiment scenarios  [1] Job locally execution simple success (ideally cluster/grid scenario)

14  [1] Job locally execution simple success (ideally cluster/grid scenario)

15 Experiment scenarios  [2] Job locally execution simple mixed (success & fail)  Umh… I agree, it’s bad plot that is hard to understand, so let’s go back to histogram plot

16  [2] Job locally execution simple mixed (success & fail)

17 Experiment scenarios  [3] Failed local job simple output - blindly delegation  Initiator: address of available neighborhood MaGate (random list, size = 3)

18  [3] Failed local job simple output  blind delegation

19  [3] Failed local job simple output  blind delegation

20 Experiment scenarios  [4] Failed local job simple output Larger PE preferred  Initiator: resource with larger PE prioritized (ordered list, size = 3)

21 Experiment scenarios  [4] Failed local job simple output Larger PE preferred

22 Experiment scenarios  [5] Failed local job output, input job-queue & input accomplishment ratio limited  Initiator: larger PE prioritized  Responder: Remote MaGate Input Queue efficient accessible  CONDITION_1: < QUEUE_LIMIT  CONDITION_2: real-time accomplishment ratio > RATIO_LIMIT

23 Experiment scenarios  [5] Failed local job output, input job-queue & input accomplishment ratio limited

24 Experiment scenarios  Initiator: larger PE prioritized  Responder: Remote MaGate Input Queue efficient accessible  CONDITION_1: < QUEUE_LIMIT  CONDITION_2: real-time accomplishment ratio > RATIO_LIMIT  Re-negotiation phrase  for simple, we only oversight CONDITION_2 here, to be other more interesting things, e.g. cost price  [6] Failed local job output & Re-negotiation on input-limited policy

25  [6] Failed local job output & Re-negotiation on input-limited policy

26 Experiment scenarios  [7] Failed local job output constrained & Re-negotiation on input-limited policy  Initiator:  larger PE prioritized (size = 3)  Res’s numOfPE > job.numOfPE  Responder: Remote MaGate Input Queue efficient accessible  CONDITION_1: < QUEUE_LIMIT  CONDITION_2: real-time accomplishment ratio > RATIO_LIMIT  To re-negotiated have different policies (for simple, we only oversight CONDITION_1 here, to be other more interesting things, e.g. cost price)

27  [7] Failed local job output constrained & Re-negotiation on input- limited policy

28 Outline  MaGate Modules  Experiment arguments  Experiment scenarios  To-do and to-think

29 To-do : we can do a lot, which to focus?  Integrated res. Discovery with Ant-based infrastructure  Improve the MaGate simulation platform together with ant simulation  Support more local policy (FCFS -> Easy Back filling)  Coupled community policy with local policy  E.g. Output jobs from: local failed jobs, and local long time waiting jobs (Easy back filling), local low priority jobs (Flexible back filling)  Agreement base solution to publish “the food” for ants  No longer only resource configuration files -> agreement offer  Agreement based Negotiation model  Flexible joint Initiator policy & responder policy  Re-negotiation model (multi-shakes :-) for complex job delegation  Larger scale experiment validation  Real system validation (interface to real RMS; real job description standards, etc…)

30 To-think  Grid task:  Find resources for executing jobs  Resources are considered as static  Jobs are dynamic and movable  Something new from buzzword…  Cloudcomputing & Virtualization  Besides their business purpose, something new for academic, and for us?  Requesting resource for specific jobs  Jobs are static (user requirement)  Resources are dynamic (create on demand, erase after usage)  What is the points then?  Resource provider/consumer & Agreement initiator/responder  Roles of jobs and resources, exchangeable?  Definition of willing (execute job? execute job with low cost?)  Definition of resource (hardware profile? On-demand virtualized capability)