Download presentation
Presentation is loading. Please wait.
Published byAmberlynn Phelps Modified over 9 years ago
1
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007 Job Dispatch and Termination Performance Agent Teamwork VS. Globus/OpenPBS Framework Execution Performance Agent Teamwork VS. MPIJava Terminology Grid vs. Cluster A computing grid is commonly distinguished from a computing cluster by the geographic distance between members. A cluster would be a group of computers in the same room or building and connected to the same physical network, while the members of grid could be located anywhere and may connected over several different networks. Platform I define an HPDC platform as software that provides Infrastructure and Scheduling services. Infrastructure services include authentication and authorization, job submission, and file transfer for job deployment. Scheduling services include dynamic resource identification and allocation, scheduling policies, and coordinating job execution. Framework I define a framework as a related set of software libraries that are used to write software in a particular programming model. The Single Program Multiple Data (SPMD) programming model is commonly used to achieve data level parallelism in HPDC. MPIJava is a Java implementation of the Message Passing Interface standard which provides a framework for programming in the SPMD model. Agent Teamwork AgentTeamwork is a mobile-agent-based job coordination system that targets a mixture of computing nodes, some directly connected to the public Internet, and others simply clustered in a private IP domain but not managed by a commodity job scheduler. 1 Globus Toolkit The Globus Toolkit is an open source software toolkit used for building Grid systems and applications. 2 OpenPBS OpenPBS is the original version of the Portable Batch System. It is a flexible batch queueing system developed for NASA in the early to mid- 1990s 3. The purpose of the OpenPBS system is to provide additional controls over initiating or scheduling execution of batch jobs; and to allow routing of those jobs between different hosts. 4 Message Passing Interface (MPI) MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users. MPI was designed for high performance on both massively parallel machines and on workstation clusters. 5 MPICH-G2 A grid-enabled implementation of the MPI v1.1 standard. It uses services from the Globus Toolkit (e.g., job startup, security), MPICH-G2 allows you to couple multiple machines, potentially of different architectures, to run MPI applications. 6 MPIJava mpiJava is an object-oriented Java interface to the standard Message Passing Interface (MPI). 7 1 Fault-Tolerant Job Execution over Multi-Clusters using Mobile agents, Munehiro Fukuda gca07.pdf 2 http://www.globus.org/ 3 http://www.openpbs.org/about.html 4 Overview of the OpenPBS, http://www.openpbs.org/overview.html 5 What is MPI, http://www-unix.mcs.anl.gov/mpi/ 6 What is MPICH-G2 http://www3.niu.edu/mpi/ 7 http://www.hpjava.org/mpiJava.html The Clusters Overview Technology AgentTeamwork My goal as a research assistant was to evaluate Agent Teamwork’s “Job Dispatch & Termination” and “Framework” performance against a contemporary alternative. Job Dispatch & Termination Evaluation: I built a reference platform to compare Agent Teamwork against by integrating the Globus Toolkit with the OpenPBS scheduler and the MPICH-G2 MPI framework. Framework Function Evaluation: To evaluate the framework performance I wrote three benchmark programs in the Agent Teamwork MPI framework and the MPIJava framework and compared their runtimes. Reference Platform Overview Results: These graphs compare job dispatch & termination time when submitting a test program to different numbers of cluster nodes in either a depth or breadth first distribution. Agent Teamwork’s job dispatch and termination performance was comparable with the reference platform in the depth first distribution And agent teamwork outperformed the reference platform with a large number of nodes in a breadth first distribution. 1 In order to run a job you generate a job definition file using the Resource Specification Language (RSL) and submit it along with your user certificate using globusrun. The gram client submits the job to a gatekeeper on the cluster head, which uses the GSI to authenticate and authorize the job submission. It then starts a job manager which issues a callback to the gram client to connect std error and std out back to the client. The job manager then submits the job details to the PBS Server. The PBS Scheduler selects appropriate nodes from the cluster and transfers the executable to the PBS mom on the cluster nodes. The PBS mom launches the application. Applications are written in the MPICH-G2 framework which uses the grid infrastructure to coordinate the parallel execution. 2 3 Framework Results: Currently two of the Agent Teamwork versions of the benchmark programs cannot be run across the clusters due to outstanding bugs in the framework. One of the benchmark programs, Wave2D, was able to run on a limited number of nodes. The graphs to the right show these partial results which indicate that the Agent Teamwork version is at least one order of magnitude slower than MPIJava. At this point however framework debugging is ongoing. The following tables describe the hardware that was used. There were a total of 66 machines divided into two clusters. Medusa ClusterPhoebe Cluster a 32-node cluster for research use a 32-node cluster for instructional use Head Node: specification outbound 1.8GHz Xeon x2, 512MB memory, and 70GB HD 100Mbps Head node: specification outbound 1.5 memory, and 40GB HD 100Mbps Computing nodes: #nodes specification inbound 24 3.2GHz Xeon, 512MB memory, and 36GB HD 1Gbps 8 2.8GHz Xeon, 512MB memory, and 60GB HD 2Gbps Computing nodes: #nodes specification inbound 16 1.5GHz Xeon, 512MB memory, and 30GB HD 100Mbps 16 1.5GHz Xeon, 512MB memory, and 30GB HD 1Gbps
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.