Agent-Based Resource Management for Grid Computing Agent-Based Resource Management for Grid Computing Junwei Cao Darren J. Kerbyson Graham R. Nudd Junwei.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Welcome to Middleware Joseph Amrithraj
Mobile Agents Mouse House Creative Technologies Mike OBrien.
Performance-responsive Middleware for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
CS 501: Software Engineering Fall 2000 Lecture 16 System Architecture III Distributed Objects.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Chapter 9: Moving to Design
Lecture Nine Database Planning, Design, and Administration
 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.
This chapter is extracted from Sommerville’s slides. Text book chapter
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Cracow Grid Workshop 2003 Institute of Computer Science AGH A Concept of a Monitoring Infrastructure for Workflow-Based Grid Applications Bartosz Baliś,
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
KARMA with ProActive Parallel Suite 12/01/2009 Air France, Sophia Antipolis Solutions and Services for Accelerating your Applications.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Self-Organizing Agents for Grid Load Balancing Junwei Cao, Ph.D. Research Scientist Center for Space Research Massachusetts Institute of Technology Cambridge,
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
February 20, AgentCities - Agents and Grids Prof Mark Baker ACET, University of Reading Tel:
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
CCGrid 2003, Tokyo, Japan GridFlow: Workflow Management for Grid Computing Junwei Cao ( 曹军威 ) C&C Research Labs, NEC Europe Ltd., Germany Stephen A. Jarvis.
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
Software Development Cycle What is Software? Instructions (computer programs) that when executed provide desired function and performance Data structures.
Content The system development life cycle
“DECISION” PROJECT “DECISION” PROJECT INTEGRATION PLATFORM CORBA PROTOTYPE CAST J. BLACHON & NGUYEN G.T. INRIA Rhône-Alpes June 10th, 1999.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Performance evaluation of component-based software systems Seminar of Component Engineering course Rofideh hadighi 7 Jan 2010.
A Summary of the Distributed System Concepts and Architectures Gayathri V.R. Kunapuli
Processes Introduction to Operating Systems: Module 3.
Department of Electronic Engineering Challenges & Proposals INFSO Information Day e-Infrastructure Grid Initiatives 26/27 May.
Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
Network design Topic 6 Testing and documentation.
Workforce Scheduling Release 5.0 for Windows Implementation Overview OWS Development Team.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Zurich Research Laboratory IBM Zurich Research Laboratory Adaptive End-to-End QoS Guarantees in IP Networks using an Active Network Approach Roman Pletka.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Dip. Di Informatica Sistemi e Produzione Università di Roma Tor Vergata E. Casalicchio, E.Galli, S.Tucci CRESCO SPIII.5 Project status Università.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Performance Modelling of Parallel and Distributed Computing Using PACE High Performance Systems Laboratory University of Warwick Junwei Cao Darren J. Kerbyson.
Slide 1 2/22/2016 Policy-Based Management With SNMP SNMPCONF Working Group - Interim Meeting May 2000 Jon Saperia.
University of Texas at Arlington Scheduling and Load Balancing on the NASA Information Power Grid Sajal K. Das, Shailendra Kumar, Manish Arora Department.
Agent-Based Grid Load-Balancing Daniel P. Spooner University of Warwick, UK Junwei Cao NEC Europe Ltd., Germany.
Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002.
IPDPS 2003, Nice, France Agent-Based Grid Load Balancing Using Performance-Driven Task Scheduling Junwei Cao (C&C Research Labs, NEC Europe Ltd., Germany)
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
CIS 375 Bruce R. Maxim UM-Dearborn
Agent-Based Grid Load-Balancing
Junwei Cao Darren J. Kerbyson Graham R. Nudd
Department of Computer Science University of Warwick
Department of Computer Science University of Warwick
Agent-based Resource Management for Grid Computing
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Agent-Based Resource Management for Grid Computing Agent-Based Resource Management for Grid Computing Junwei Cao Darren J. Kerbyson Graham R. Nudd Junwei Cao Darren J. Kerbyson Graham R. Nudd Department of Computer Science University of Warwick Department of Computer Science University of Warwick

OutlinesOutlines Research backgroundResearch background Sweep3D: performance evaluation of parallel applications using PACESweep3D: performance evaluation of parallel applications using PACE A4 (Agile Architecture and Autonomous Agents): a reference model for building large- scale distributed software systemsA4 (Agile Architecture and Autonomous Agents): a reference model for building large- scale distributed software systems ARMS: an Agent-based Resource Management System for grid computingARMS: an Agent-based Resource Management System for grid computing PMA: a Performance Monitor and Advisor for ARMSPMA: a Performance Monitor and Advisor for ARMS Conclusions and furture worksConclusions and furture works Research backgroundResearch background Sweep3D: performance evaluation of parallel applications using PACESweep3D: performance evaluation of parallel applications using PACE A4 (Agile Architecture and Autonomous Agents): a reference model for building large- scale distributed software systemsA4 (Agile Architecture and Autonomous Agents): a reference model for building large- scale distributed software systems ARMS: an Agent-based Resource Management System for grid computingARMS: an Agent-based Resource Management System for grid computing PMA: a Performance Monitor and Advisor for ARMSPMA: a Performance Monitor and Advisor for ARMS Conclusions and furture worksConclusions and furture works

Research Background

Resource Management Resource Management The overall aim of the resource management is to efficiently schedule applications that need to utilise the available resources in the metacomputing environment. uses APIs defined by the LDAP service. uses objects as the main system abstraction throughout uses the matchmaker/entity structure uses agents, each as both a database and a resource broker uses a metaserver/servers structure uses a broker/agents structure

Performance Evaluation Performance Evaluation Such goals within the high performance community will rely on accurate performance evaluation and prediction capabilities.

Multi-Agent Systems Multi-Agent Systems Software agents have been accepted to be a powerful high-level abstraction for the modelling of complex software systems. Agents are computer systems, capable of flexible, autonomous action in dynamic, unpredictable, typically multi-agent domains. Knowledge representation Agent communication language Agent negotiation Agent coordination

Service Discovery Service Discovery A service is an entity that can be used by a person, a program, or another service. Service advertisement and discovery technologies enable device cooperation and reduce configuration hassles, a necessity in today’s increasingly mobile computing environment.

Performance Evaluation Using PACE PACE toolkitPACE toolkit –Layered framework –Object definition –Model creation –Mapping relations Sweep3D: a case studySweep3D: a case study –Model decomposition –Parallel Template –Validation on SGI O2000 –Validation on Sun Ultra1

Evaluation Engine Performance Analysis Application Tools PACE Toolkit PACE Toolkit SourceCodeAnalysis ObjectEditor ObjectLibrary Resource Tools PSL Scripts Compiler Evaluation Engine PerformanceAnalysis On-the-flyAnalysis User Interface Resource Model Application Model User Interface CPU Network Cache HMCL Scripts Resource Model Application Model

Layered Framework Layered Framework Application Domain Application Subtask Parallel Template Hardware Model parameters Time, Predictive trace Hardware Description Parallel Description Sequential Description Entry level Independent

Object Definition Object Definition Software Object IdentifierType Include External Var. Def. Link Options Procedures Object 2 (lower) Object 3 (higher) Object 1 (lower) Object 2 (lower) Object 1 (lower) Hardware Object CPU Memory Network clc Cache L2 Main Cache L1 SocketsMPIPVM flcsuifct Uniform

Model Creation Model CreationSourceCodeSUIF Front End SUIFFormatUserProfilerACTApplicationLayer Parallelisation Layer Software model creation using an ACT toolSoftware model creation using an ACT tool Hardware model creation using an HMCL languageHardware model creation using an HMCL language SemiAutomatic

Mapping Relations Mapping Relations Strict Application Source Code Model Scripts Parallel Template Subtask SerialPart SerialPart SerialPart AbstractedParallelPart Hardware Object (HMCL) TxTxTxTx

Overview of Sweep3D Overview of Sweep3D Sweep3D is a part of the Accelerated Strategic Computing Initiative (ASCI) application suite.Sweep3D is a part of the Accelerated Strategic Computing Initiative (ASCI) application suite. Sweep3D solves a 1-group time-independent discrete ordinates (Sn) 3D cartesian (XYZ) geometry neutron transport problem.Sweep3D solves a 1-group time-independent discrete ordinates (Sn) 3D cartesian (XYZ) geometry neutron transport problem. Sweep3D exploits parallelism through the wavefront process.Sweep3D exploits parallelism through the wavefront process.

Hardware Layer Model Decomposition Model Decomposition Application Layer Parallel Template Layer sweep3d sourcesweepfixedflux_err asyncpipelineglobal_sumglobal_max SgiOrigin2000 SunUltra1

Parallel Template Parallel Template partmp pipeline { proc exec init { step cpu { confdev Tx_sweep_init; } for( phase = 1; phase <= 8; phase = phase + 1){ step cpu { confdev Tx_octant; } step cpu { confdev Tx_get_direct; } for( i = 1; i <= mmo; i = i + 1 ) { step cpu { confdev Tx_pipeline_init; } for( j = 1; j <= kb; j = j + 1 ) { step cpu { confdev Tx_kk_loop_init; } for( x = 1; x <= npe_i; x = x + 1 ) for( y = 1; y <= npe_j; y = y + 1 ) { myid = Get_myid( x, y ); ew_rcv = Get_ew_rcv( phase, x, y ); if( ew_rcv != 0 ) step mpirecv { confdev ew_rcv, myid, nib; } else step cpu on myid { confdev Tx_else_ew_rcv; } } step cpu { confdev Tx_comp_face; } for( x = 1; x <= npe_i; x = x + 1 ) for( y = 1; y <= npe_j; y = y + 1 ) { myid = Get_myid( x, y ); ns_rcv = Get_ns_rcv( phase, x, y ); if( ns_rcv != 0 ) step mpirecv { confdev ns_rcv, myid, njb; } else step cpu on myid { confdev Tx_else_ns_rcv; } } step cpu { confdev Tx_work; } } step cpu { confdev Tx_last; } } void sweep() { sweep_init(); for( iq = 1; iq <= 8; iq++ ) { octant(); get_direct(); for( mo = 1; mo <=mmo; mo++) { pipeline_init(); for( kk = 1; kk <= kb; kk++) { kk_loop_init(); if (ew_rcv != 0) info = MPI_Recv(Phiib, nib, MPI_DOUBLE, tids[ew_rcv], ew_tag, MPI_COMM_WORLD, &status); else else_ew_rcv(); comp_face(); if (ns_rcv != 0) info = MPI_Recv(Phijb, njb, MPI_DOUBLE, tids[ns_rcv], ns_tag, MPI_COMM_WORLD, &status); else else_ns_rcv(); work(); } last(); } config SgiOrigin2000 { hardware { } pvm { } mpi { DD_COMM_A = 512, DD_COMM_B = , DD_COMM_C = , DD_COMM_D = , DD_COMM_E = , DD_TRECV_A = 512, DD_TRECV_B = , DD_TRECV_C = , DD_TRECV_D = , DD_TRECV_E = , DD_TSEND_A = 512, DD_TSEND_B = , DD_TSEND_C = , DD_TSEND_D = , DD_TSEND_E = , } clc {.... CMLL = , CMLG = , CMSL = , CMSG = , CMCL = , CMCG = , CMFL = , CMFG = ,.... }

Validation on SGI O2000 Validation on SGI O2000

Validation on Sun Ultra1 Validation on Sun Ultra1 grid size: 15x15x Processors Run time (sec) Model Measured grid size: 25x25x Processors Run time (sec) Model Measured grid size: 35x35x Processors Run time (sec) Model Measured grid size: 50x50x Processors Run time (sec) Model Measured

PACE Summary PACE Summary Accurate prediction resultsAccurate prediction results – 15% error at most Rapid evaluation timeRapid evaluation time – typically less than 2s Easy cross-platform comparisonsEasy cross-platform comparisons ScalabilityScalability – Multiple administrative domains – Millions of computing resources AdaptabilityAdaptability – Communication irregularities – Performance changing

The Question Is …

A4 Methodology gility: rchitecture: utonomy: gent: quick adaption of the changing environment a clue of the components in a system act without direct intervention a high-level abstraction of complex systems Agent hierarchy Agent structure Service discovery Service advertisement Agent capability tables Performance metrics A4 simulator Anythin g else …

Agent Hierarchy Agent Hierarchy An agent is … A local managerA local manager An user middlemanAn user middleman A brokerA broker A coordinatorA coordinator A service providerA service provider A service requestorA service requestor A matchmakerA matchmaker A routerA router

Agent Structure Agent Structure Local Management Layer Coordination Layer Communication Layer Communication Layer – Agents in the system must be able to communicate with each other using common data models and communication protocols.Communication Layer – Agents in the system must be able to communicate with each other using common data models and communication protocols. Coordination Layer – The data an agent receives at the communication layer should be explained and submitted to the coordination layer, which decides how the agent should act on the data according to its own knowledge.Coordination Layer – The data an agent receives at the communication layer should be explained and submitted to the coordination layer, which decides how the agent should act on the data according to its own knowledge. Local Management Layer – An agent as a local manager is responsible to maintain the local services and provide service information needed by the coordination layer to make decisions.Local Management Layer – An agent as a local manager is responsible to maintain the local services and provide service information needed by the coordination layer to make decisions.

Service Discovery Service Discovery Local Management Layer Coordination Layer Communication Layer Service Advertisement NEXT!

Service Advertisement Full service advertisement – requires no service discovery.Full service advertisement – requires no service discovery. No service advertisement – results in complex service discovery.No service advertisement – results in complex service discovery. Make Balance! Hi, please find attached my service information. Hi, could you please give me some service information that you have?

Agent Capability Tables Agent Capability Tables The process of the service advertisement and discovery corresponds to the maintenance and lookup of the ACTs. Vary by source: T_ACT: contains service info of local resourcesT_ACT: contains service info of local resources L_ACT: contains service info coming from lower agentsL_ACT: contains service info coming from lower agents G_ACT: contains service info coming from upper agentG_ACT: contains service info coming from upper agent C_ACT: contains cached service info during discoveryC_ACT: contains cached service info during discovery Strategies: Data-push: submit service info to other agentsData-push: submit service info to other agents Data-pull: ask for service info from other agentsData-pull: ask for service info from other agents Periodical: Periodical ACT maintenancePeriodical: Periodical ACT maintenance Event-driven: ACT maintenance driven by system eventsEvent-driven: ACT maintenance driven by system events

Performance Metrics Performance Metrics Discovery speedDiscovery speed System efficiencySystem efficiency Load balancingLoad balancing Success rateSuccess rate Conflicting

A4 Simulator A4 Simulator Agent Hierarchy Requests Services Strategies Agent Mobility Request Distribution Service Distribution Global Strategies ra d rfrfrfrf v e b f Kernel PerformanceModel Simulation Engine Model Composer Agent-levelModelling System- level Modelling Step-by-stepView LogView AccumulativeView AgentView GUIGUIInputOutput

A4 Simulator Implementation A4 Simulator Implementation Model Browser Agent Viewer Model Viewer Simu Results Support for all performance metricsSupport for all performance metrics Support for all strategy configurationsSupport for all strategy configurations Two level performance modellingTwo level performance modelling Multi-view simulation result displayMulti-view simulation result display Comparing strategiesComparing strategies Agent mobility modellingAgent mobility modelling

A Case Study A Case Study MobilityImpact LearningProcess New Learning Process HigherPerformance StableState Impact of service mobility on discovery performance Impact of service mobility on discovery performance Agent hierarchy

Summary Summary A4 is a reference model for building large-scale distributed software systems with highly dynamic behaviours. A4 + PACE  ARMS

ARMS for Grid Computing ARMS in contextARMS in context ARMS architectureARMS architecture ARMS agent structureARMS agent structure –Service information –Request information –Multi-processor scheduling ARMS implementationARMS implementation A case studyA case study –Agents & resources –Applications & requests –Experiment results I –Experiment results II At local level, PACE functions can supply accurate performance info. At meta level, agents cooperate with each other for service discovery.

ARMS in Context ARMS in Context ARMS GridResources GridUsers A4 PACE Application Tools (AT) Resource Tools (RT) Evaluation Engine (EE) A4 Simulator PMA

ARMS Architecture ARMS Architecture Application Models Cost Models Processors Agents Users PMA Resource Models Bottleneck? AT RT RT RT RT EE EE EE EE EE EE EE ACT ACT ACT ACT ACT ACT ACT

Local Coordination Comm. ARMS Agent Structure ARMS Agent Structure PACEEvaluationEngine Communication Module Scheduler ACTManager MatchMaker ACTs Advertisement Discovery Application Model Eval Results Sched. Cost Cost Model To Another Agent Agent ID Application Execution ApplicationManagementResourceAllocationResourceMonitoring App. Info Service Info Res. Info

Service Information Service Information Service Info. Proc. 1 ID Proc. 2 ID Proc. n ID … App. 1 ID App. 2 ID App. m ID … Type PACE res. model Type … Start time End time Start time End time … Res. Info. App. Info. App. Res. Mapping ACT manager controls the agent access to the ACT database, where service information of grid resources are recorded.

Request Information Request Information A request sent to an ARMS agent should include all related information on application and execution requirements. PACE application model includes all performance related information of the application to be executed, and can be input and evaluated using PACE evaluation engine.PACE application model includes all performance related information of the application to be executed, and can be input and evaluated using PACE evaluation engine. Cost model includes all performance metrics and corresponding values, which need to be met by a grid service provided by a grid resource. These may include execution time, memory usage, etc.Cost model includes all performance metrics and corresponding values, which need to be met by a grid service provided by a grid resource. These may include execution time, memory usage, etc.

Multi-processor Scheduling Multi-processor Scheduling Processor 1 Processor 2 Processor 3 Processor 4 Processor 5 Processor 6 Processor 7 Processor 8

ARMS Implementation ARMS Implementation Auto clients Info browser Gantt chart Agent platform C/C++, X WindowsC/C++, X Windows Simple data structure for data representationSimple data structure for data representation File system for data management and agent communicationFile system for data management and agent communication Multi-thread agent kernelMulti-thread agent kernel

A Case Study A Case Study 8 agents, 8 grid resources, 16*8 processors8 agents, 8 grid resources, 16*8 processors SGI Origin2000, Sun clustersSGI Origin2000, Sun clusters 7 applications, 149 requests7 applications, 149 requests Sweep3D, fft, jacobi, memsort, etc.Sweep3D, fft, jacobi, memsort, etc. 1 request / 3 sec, 7 min.1 request / 3 sec, 7 min. Random frequency, application, agentRandom frequency, application, agent 16% 1-step, 7% 2-step discovery16% 1-step, 7% 2-step discovery Application distribution 7% - 19%Application distribution 7% - 19% 97% success rate97% success rate

Agents & Resources Agents & Resources coke burrough s budweise r spriteorigintizer rubbish gem Agent Resource Type #Processors/Hosts gem SGI Origin origin 16 sprite Sun Ultra tizer 16 coke Sun Ultra 1 16 budweiser Sun Ultra 5 16 burroughs Sun SPARCstation 2 16 rubbish 16

Application & Requests Application & Requests

Experiment Results I Experiment Results I Applications tizer

Experiment Results II Experiment Results II Applications distribution Statistical results

The Answer Is ARMSARMS

PMA Agent PMA structurePMA structure Performance optimisation strategiesPerformance optimisation strategies –Use of ACTs –Limit service lifetime –Limit scope of service advertisement and discovery –Agent mobility and service distribution Performance steering policiesPerformance steering policies A case studyA case study –Agents & strategies –Requests & services –Simulation results I –Simulation results II

PMA Structure PMA Structure Performance Model Model Composer Simulation Engine Monitoring Reconfiguration PMA ARMS Agent Statisticaldata Strategies Policies Relative request performance valueRelative request performance value Request sending frequencyRequest sending frequency Relative service performance valueRelative service performance value Service performance changing frequencyService performance changing frequency

Performance Optimisation Strategies Performance Optimisation Strategies Use of ACTs Limit service lifetime Vary by DynamicsDynamics HierarchyHierarchy DistributionDistribution Pre-knowledgePre-knowledge Agent mobility and service distribution Limit scope of service advertisement and discovery

Performance Steering Policies Performance Steering Policies T_ACT: event-driven data-pushT_ACT: event-driven data-push C_ACT: event-driven data-pull and data-pushC_ACT: event-driven data-pull and data-push L_ACT: avoid using redundant advertisementL_ACT: avoid using redundant advertisement G_ACT: avoid using data-pushG_ACT: avoid using data-push Avoid using event-driven and periodic approaches simultaneouslyAvoid using event-driven and periodic approaches simultaneously Avoid using data-pull and data-push approaches simultaneouslyAvoid using data-pull and data-push approaches simultaneously Two level performance steeringTwo level performance steering Comparing different combinations of strategiesComparing different combinations of strategies Policies for balancing workload between service advertisement and discovery:

A Case Study A Case Study 251 agents, 3 layers251 agents, 3 layers System level configuration of strategiesSystem level configuration of strategies 4 ACT usage, 6 strategies4 ACT usage, 6 strategies 13 experiments13 experiments System level definitions of services and requestsSystem level definitions of services and requests Comparing different combinations of strategiesComparing different combinations of strategies A middle strategy is chosen as bestA middle strategy is chosen as best Agent level configuration may lead to better performanceAgent level configuration may lead to better performance

Agents & Strategies Agents & Strategies Agents Upper Agent gem- sprite~0……sprite~49gem tup~0……tup~49sprite~9 cola~0……cola~49sprite~19 tango~0……tango~49sprite~29 pepsi~0……pepsi~49sprite~39 Performance Optimisation StrategiesExperiment Number T_ACT: event-driven data-pushVVVVVV C_ACT: event-driven data-push and data-pull VVVVV L_ACT: event-driven data-push VVVV G_ACT: periodic data-pull every 10 steps VVV L_ACT: periodic data-pull every 10 steps VV G_ACT: event-driven data-push V

Requests & Services Requests & Services NameRelativePerformanceFreqLifetimeScope Dist (%) HPC10005UnlimitedTop20 HPC60010UnlimitedTop40 HPC20020UnlimitedTop60 NameRelative Performance Freq.ScopeDist. (%) HPC1005Top80 HPC30010Top60 HPC50020Top40 HPC80040Top20 HPC100060Top10

Simulation Results I Simulation Results I Metrics Experiment Number r a d v e

Simulation Results II Simulation Results II Freq Never v e

Conclusions Conclusions Main contributes include: Performance prediction driven for QoS support of grid resource managementPerformance prediction driven for QoS support of grid resource management Agent based hierarchical model for grid resource advertisement and discoveryAgent based hierarchical model for grid resource advertisement and discovery Simulation based performance optimisation and steering of service discovery in large scale multi- agent systemsSimulation based performance optimisation and steering of service discovery in large scale multi- agent systems In summary, all of above go together to provides an available methodology and prototype implementation of agent-based resource management for grid computing, which can be used as a fundamental framework for further improvement and refinement.

Future Works Future Works Java implementationJava implementation Agent communication languageAgent communication language Resource specification languageResource specification language Use of Globus, LDAP, SNMP, XML, etcUse of Globus, LDAP, SNMP, XML, etc New performance optimisation strategiesNew performance optimisation strategies New performance steering policiesNew performance steering policies Use of historic information for service discoveryUse of historic information for service discovery New protocols to support more strong QoSNew protocols to support more strong QoS Agent-level hardware configurationAgent-level hardware configuration New distributed scheduling algorithmsNew distributed scheduling algorithms Advanced multi-processor scheduling algorithmsAdvanced multi-processor scheduling algorithms PACE light upPACE light up Experiments on IBM S/390Experiments on IBM S/390

ReferencesReferences Modelling of ASCI High Performance Applications Using PACE, J. Cao, D.J. Kerbyson, E. Papaefstathiou, and G.R. Nudd, in Proc. UKPEW’99, Bristol, UK, 1999.Modelling of ASCI High Performance Applications Using PACE, J. Cao, D.J. Kerbyson, E. Papaefstathiou, and G.R. Nudd, in Proc. UKPEW’99, Bristol, UK, Performance Modeling of Parallel and Distributed Computing Using PACE, J. Cao, D.J. Kerbyson, E. Papaefstathiou, and G.R. Nudd, in Proc. IPCCC’00, Phoenix, USA, 2000.Performance Modeling of Parallel and Distributed Computing Using PACE, J. Cao, D.J. Kerbyson, E. Papaefstathiou, and G.R. Nudd, in Proc. IPCCC’00, Phoenix, USA, Dynamic Application Integration Using Agent-Based Operational Administration, J. Cao, D.J. Kerbyson, and G.R. Nudd, in Proc. PAAM’00, Manchester, UK, 2000.Dynamic Application Integration Using Agent-Based Operational Administration, J. Cao, D.J. Kerbyson, and G.R. Nudd, in Proc. PAAM’00, Manchester, UK, Performance Evaluation of an Agent-Based Resource Management Infrastructure for Grid Computing, J. Cao, D.J. Kerbyson, and G.R. Nudd, in Proc. CCGrid’01, Brisbane, Australia, 2001.Performance Evaluation of an Agent-Based Resource Management Infrastructure for Grid Computing, J. Cao, D.J. Kerbyson, and G.R. Nudd, in Proc. CCGrid’01, Brisbane, Australia, Use of Agent-based Service Discovery for Resource Management in Metacomputing Environment, J. Cao, D.J. Kerbyson, and G.R. Nudd, in Proc. EuroPar’01, Manchester, UK, 2001.Use of Agent-based Service Discovery for Resource Management in Metacomputing Environment, J. Cao, D.J. Kerbyson, and G.R. Nudd, in Proc. EuroPar’01, Manchester, UK, 2001.

References Continued Application Characterisation Using a Lightweight Transaction Model, D.P. Spooner, J.D. Turner, J. Cao, S.A. Jarvis, and G.R. Nudd, in Proc. UKPEW’01, Leeds, UK, 2001.Application Characterisation Using a Lightweight Transaction Model, D.P. Spooner, J.D. Turner, J. Cao, S.A. Jarvis, and G.R. Nudd, in Proc. UKPEW’01, Leeds, UK, High Performance Service Discovery in Large-Scale Multi-Agent and Mobile-Agent Systems, J. Cao, D.J. Kerbyson, and G.R. Nudd, to appear in Int. J. Software Engineering and Knowledge Engineering, Special Issue on Multi-Agent Systems and Mobile Agents.High Performance Service Discovery in Large-Scale Multi-Agent and Mobile-Agent Systems, J. Cao, D.J. Kerbyson, and G.R. Nudd, to appear in Int. J. Software Engineering and Knowledge Engineering, Special Issue on Multi-Agent Systems and Mobile Agents. ARMS: an Agent-Based Resource Management System for Grid Computing, J. Cao, D.J. Kerbyson, and G.R. Nudd, submitted for journal publication.ARMS: an Agent-Based Resource Management System for Grid Computing, J. Cao, D.J. Kerbyson, and G.R. Nudd, submitted for journal publication. Use of Performance Prediction Technology for QoS Support of Resource Management for Grid Computing, J. Cao, S.A. Jarvis, D.J. Kerbyson, G.R. Nudd, D.P. Spooner, and J.D. Turner, submitted to CCGrid’02.Use of Performance Prediction Technology for QoS Support of Resource Management for Grid Computing, J. Cao, S.A. Jarvis, D.J. Kerbyson, G.R. Nudd, D.P. Spooner, and J.D. Turner, submitted to CCGrid’02.