Presentation is loading. Please wait.

Presentation is loading. Please wait.

MaGate Experiments on Scenarios GridGroup EIF, Feb 5th, 2009 Ye HUANG Pervasive Artificial Intelligence Group, Dept of Informatics, University of Fribourg,

Similar presentations


Presentation on theme: "MaGate Experiments on Scenarios GridGroup EIF, Feb 5th, 2009 Ye HUANG Pervasive Artificial Intelligence Group, Dept of Informatics, University of Fribourg,"— Presentation transcript:

1 MaGate Experiments on Scenarios GridGroup EIF, Feb 5th, 2009 Ye HUANG Pervasive Artificial Intelligence Group, Dept of Informatics, University of Fribourg, Switzerland Grid Group, Dept of Information and Communication Technologies, EIA-FR, Switzerland

2 1 Outline  MaGate Modules  Experiment arguments  Experiment scenarios  To-do and to-think

3 2 MaGate Modules (1)  Match Maker: scheduling decision joint with local RMS  Currently, no difference between local jobs and received community jobs  Module Controller: message center and services invoker  MaGate Monitor: tracking the events/logs of MaGate for self- diagnostic and statistic purpose  Others: auxiliary for simulating resources, jobs, res. discovery

4 3 MaGate Modules (2)  SIM-I: submitting simulation jobs locally  Job simulation  job size (est MIPS * sec)  I/O size  number of requested CPU (PE: process element)  Arrival time  Expected finish time  Priority  Job generation from archived HPC workload trace file  Done and tested, but no used here for its costly time consumption

5 4 MaGate Modules (3)  SIM-I: executing simulated jobs locally  Resource simulation  Massive Parallel Processor System (MPP)  Number of PE (process element)  Local policy (space shared based FCFS)  Time zone and use cost  Calendar (Holiday policy)

6 5 MaGate Modules (4)  Output Requester  Currently: process failed local jobs to available neighbors  To extend: process in-queued local jobs? Coupled with local scheduling policy  Input Requester  Process input job delegation request, check local community policy, and give the answer  Output Responser  Process the finished community jobs, sends them to their original MaGate  Input Responser  Process the returned output jobs  Community Monitor  Check the CM status, and adjust community policy if necessary

7 6 MaGate Modules (5)  Data Storage  Currently: in memory class  To extend: database (persistent)  Res. Discovery  Currently: in memory  To extend: Amos? :-)  Res. Monitoring  Not done  Scheduling Policy  To refactoring the MatchMaker (Kernel Module)

8 7 Outline  MaGate Modules  Experiment arguments  Experiment scenarios  To-do and to-think

9 8 Experiment arguments (1)  Site Model  contribute their computational resources and share their jobs  Typically, each site has its own local RMS to submit jobs to local resources  Machine Model  MPPs (Massive Parallel Processor System)  PE stands for “Process Element”  Each PE is a single processing system with local memory and storage, uses space-sharing policy and runs jobs exclusively.  Different MPPs only differ in the number of Pes  Job Model  Currently only concerning the batch jobs which are dominant on most MPP systems  Each job is comprised of several parameters  Requested run time and number of PE by a job is assumed to be known. (Although it’s tough in real world)

10 9 Experiment arguments (2)  For each experiment iteration  10 simultaneous MaGate  1 resource(1 MPP) each MaGate  Total number of processor (PE), [radix/2, radix]. {e.g. if radix=128, then [64, 128]}  PE MIPS: around 1000  1000 job from each MaGate  Job length: around (estimatedMIPS * estimatedSec = 1000 * 2000)  Job arrival time: 0 ~ 43,200 sec. (12 heures)  Bad job ratio: 30%  Good job: number of requested processor: [1-5]  Bad job (though guys): number of requested processor: [radix/2, radix]. {e.g. if radix=128, then [64, 128]}

11 10 Outline  MaGate Modules  Experiment arguments  Experiment scenarios  To-do and to-think

12 11 Experiment scenarios  Beyond one assumption of Grid  Once scheduling decisions are made, the resource is supposed to execute the job successfully (if infrastructure is not failed)  Users know where (which grid/grid section) to submit their jobs

13 12 Experiment scenarios  [1] Job locally execution simple success  [2] Job locally execution simple mixed  [3] Failed local job simple output - blindly delegation  [4] Failed local job simple output - Larger PE preferred  [5] Failed local job output, input job-queue & input accomplishment ratio limited  [6] Failed local job output & Re-negotiation on input-limited policy  [7] Failed local job output constrained & Re-negotiation on input- limited policy  [8] In-queued waiting local job community delegation

14 13 Experiment scenarios  [1] Job locally execution simple success (ideally cluster/grid scenario)

15 14  [1] Job locally execution simple success (ideally cluster/grid scenario)

16 15 Experiment scenarios  [2] Job locally execution simple mixed (success & fail)  Umh… I agree, it’s bad plot that is hard to understand, so let’s go back to histogram plot

17 16  [2] Job locally execution simple mixed (success & fail)

18 17 Experiment scenarios  [3] Failed local job simple output - blindly delegation  Initiator: address of available neighborhood MaGate (random list, size = 3)

19 18  [3] Failed local job simple output  blind delegation

20 19  [3] Failed local job simple output  blind delegation

21 20 Experiment scenarios  [4] Failed local job simple output Larger PE preferred  Initiator: resource with larger PE prioritized (ordered list, size = 3)

22 21 Experiment scenarios  [4] Failed local job simple output Larger PE preferred

23 22 Experiment scenarios  [5] Failed local job output, input job-queue & input accomplishment ratio limited  Initiator: larger PE prioritized  Responder: Remote MaGate Input Queue efficient accessible  CONDITION_1: < QUEUE_LIMIT  CONDITION_2: real-time accomplishment ratio > RATIO_LIMIT

24 23 Experiment scenarios  [5] Failed local job output, input job-queue & input accomplishment ratio limited

25 24 Experiment scenarios  Initiator: larger PE prioritized  Responder: Remote MaGate Input Queue efficient accessible  CONDITION_1: < QUEUE_LIMIT  CONDITION_2: real-time accomplishment ratio > RATIO_LIMIT  Re-negotiation phrase  for simple, we only oversight CONDITION_2 here, to be other more interesting things, e.g. cost price  [6] Failed local job output & Re-negotiation on input-limited policy

26 25  [6] Failed local job output & Re-negotiation on input-limited policy

27 26 Experiment scenarios  [7] Failed local job output constrained & Re-negotiation on input-limited policy  Initiator:  larger PE prioritized (size = 3)  Res’s numOfPE > job.numOfPE  Responder: Remote MaGate Input Queue efficient accessible  CONDITION_1: < QUEUE_LIMIT  CONDITION_2: real-time accomplishment ratio > RATIO_LIMIT  To re-negotiated have different policies (for simple, we only oversight CONDITION_1 here, to be other more interesting things, e.g. cost price)

28 27  [7] Failed local job output constrained & Re-negotiation on input- limited policy

29 28 Outline  MaGate Modules  Experiment arguments  Experiment scenarios  To-do and to-think

30 29 To-do : we can do a lot, which to focus?  Integrated res. Discovery with Ant-based infrastructure  Improve the MaGate simulation platform together with ant simulation  Support more local policy (FCFS -> Easy Back filling)  Coupled community policy with local policy  E.g. Output jobs from: local failed jobs, and local long time waiting jobs (Easy back filling), local low priority jobs (Flexible back filling)  Agreement base solution to publish “the food” for ants  No longer only resource configuration files -> agreement offer  Agreement based Negotiation model  Flexible joint Initiator policy & responder policy  Re-negotiation model (multi-shakes :-) for complex job delegation  Larger scale experiment validation  Real system validation (interface to real RMS; real job description standards, etc…)

31 30 To-think  Grid task:  Find resources for executing jobs  Resources are considered as static  Jobs are dynamic and movable  Something new from buzzword…  Cloudcomputing & Virtualization  Besides their business purpose, something new for academic, and for us?  Requesting resource for specific jobs  Jobs are static (user requirement)  Resources are dynamic (create on demand, erase after usage)  What is the points then?  Resource provider/consumer & Agreement initiator/responder  Roles of jobs and resources, exchangeable?  Definition of willing (execute job? execute job with low cost?)  Definition of resource (hardware profile? On-demand virtualized capability)


Download ppt "MaGate Experiments on Scenarios GridGroup EIF, Feb 5th, 2009 Ye HUANG Pervasive Artificial Intelligence Group, Dept of Informatics, University of Fribourg,"

Similar presentations


Ads by Google