Agent Teamwork Research Assistant

Slides:



Advertisements
Similar presentations
Three types of remote process invocation
Advertisements

Mobile Agents Mouse House Creative Technologies Mike OBrien.
University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
M-grid Using Ubiquitous Web Technologies to create a Computational Grid R J Walters and S Crouch 21 January 2009.
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Distributed components
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Performance Evaluation
Company LOGO Development of Resource/Commander Agents For AgentTeamwork Grid Computing Middleware Funded By Prepared By Enoch Mak Spring 2005.
Inter-cluster Job Deployment by AgentTeamwork Sentinel Agents Emory Horvath CSS497 Spring 2006 Advisor: Dr. Munehiro Fukuda.
Grids and Globus at BNL Presented by John Scott Leita.
Message Passing Interface In Java for AgentTeamwork (MPJ) By Zhiji Huang Advisor: Professor Munehiro Fukuda 2005.
G RID R ESOURCE BROKER FOR SCHEDULING COMPONENT - BASED APPLICATIONS ON DISTRIBUTED RESOURCES Reporter : Yi-Wei Wu.
Christopher Jeffers August 2012
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.
Using NMI Components in MGRID: A Campus Grid Infrastructure Andy Adamson Center for Information Technology Integration University of Michigan, USA.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
8/25/2005IEEE PacRim The Design Concept and Initial Implementation of AgentTeamwork Grid Computing Middleware Munehiro Fukuda Computing & Software.
George Goulas, Christos Gogos, Panayiotis Alefragis, Efthymios Housos Computer Systems Laboratory, Electrical & Computer Engineering Dept., University.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Grid Computing Framework A Java framework for managed modular distributed parallel computing.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Enabling the use of e-Infrastructures with.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
CSS497 Undergraduate Research Performance Comparison Among Agent Teamwork, Globus and Condor By Timothy Chuang Advisor: Professor Munehiro Fukuda.
Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
Parallel Computing Globus Toolkit – Grid Ayaka Ohira.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
The Distributed Application Debugger (DAD)
Workload Management Workpackage
Grid2Win Porting of gLite middleware to Windows XP platform
Introduction to Distributed Platforms
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Dynamic Deployment of VO Specific Condor Scheduler using GT4
OpenMosix, Open SSI, and LinuxPMI
GridBench: A Tool for Benchmarking Grids
Peter Kacsuk – Sipos Gergely MTA SZTAKI
GWE Core Grid Wizard Enterprise (
Globus —— Toolkits for Grid Computing
Enable computational and experimental  scientists to do “more” computational chemistry by providing capability  computing resources and services at their.
Distribution and components
CRESCO Project: Salvatore Raia
Building Grids with Condor
NGS computation services: APIs and Parallel Jobs
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Management of Virtual Execution Environments 3 June 2008
Distributed System Concepts and Architectures
Using the Parallel Universe beyond MPI
Advanced Operating Systems
CSS490 Grid Computing Textbook No Corresponding Chapter
Ashish Malgi, Neelesh Bansod, Byung K. Choi
From Prototype to Production Grid
Chapter 7 –Implementation Issues
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Agent Teamwork Research Assistant Evaluation of Agent Teamwork A High Performance Distributed Computing Middleware For my internship over that last two quarters I worked as an undergraduate research assistant on a joint project between UWB-Ehime University in Japan, called AgentTeamWork. The project is guided, here, by Professor Munehiro Fukuda, who was my advisor on the project. I’m going to start with a little background, then talk about the work I did and what I learned Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007

What is Agent Teamwork? HPDC Middleware Job Dispatch & Termination Programming Framework Under Ongoing Development HPDC is the use of distributed computing and parallel processing to improve application performance Agent Teamwork provides two main functions Job Dispatch & Termination – dynamic node selection, file transfer for executable & data files, stoping, starting ,returning results Programming Framework – programmatic coordination of the distributed resources, supports distributed application development Development is performance focused – My job to evaluate performance

Project Objectives Evaluate Agent Teamwork’s performance against a contemporary alternative Job Dispatch & Termination Performance Framework Performance Build a Reference Platform Write 3 benchmark programs that exercise the framework My main objective was to evaluate Agent Teamworks Job Dispatch & Termination and Framework performance against current mainstream solutions. In order to do this I had to build a reference platform to evaluate it against and write 3 benchmark programs that would exercise the framework.

Job Dispatch & Termination Performance Evaluation Globus Based Reference Platform Globus Toolkit OpenPBS scheduler MPICH-G2 No 1 to 1 match with Agent Teamwork, instead a number products providing different sets of services GTK, the defacto standard for grid computing GTK Just a toolkit – a complete reference platform required integration with openpbs and mpich

Reference Platform Hardware 66 computers divided into 2 clusters Agent Teamwork also runs on these same 66 computers Hardware is different between clusters but comparisons are apples to apples

Reference Platform Overview Getting these components, built, installed, and configured to the point where I could run a distributed job I had to overcome a lot of challenges This diagram provide a detailed overview of how the reference platform worked. In order to run a job you generate a job definition file using the RSL and submit it along with your user certificate The globus run program parses the rsl and in the case of a multi cluster job, it uses the duroc library to coordinate a gram client for each cluster The gram client submits the job to a gatekeeper on the cluster head, which uses the GSI to authenticate and authorize the job submission. It then starts a job manager which issues a callback to the gram client to connect std error and std out back to the client The job manager then submits the job details to the pbs server which applies any policies to determine which queue to place the job in The pbs scheduler locates suitable nodes for the job and transfers the executable and any data files to the selected nodes PBS mom launches the application Applications written in the MPICH_G2 framework make use of duroc and the grid security infrastructure coordinate their cooperative parallel execution

Reference Platform Challenges Administrator Access to Machines Host Config & Cryptic Error Messages DNS vs hosts files Inconsistent hosts files Inconsistent ptr records Inconsistent port acls : globus_init: failed GTK Authentication Wide variance in system configuration parameters The platform components had dependencies on these configurations Discovery required strace and or tcpdump or attaching gdb

Debugging Strace TcpDump GDB

Job Dispatching and Termination Function Evaluation Not evaluating the job execution performance Methodology Ported available test program to the MPICH-G2 framework measure how long it takes a job submission to be deployed, executed and cleaned up Run with 2-64 nodes across the two clusters in a depth-first node distribution series and a breadth-first node distribution series Not evaluating job execution Needed a lightweight test program Ported professors test program to MPICH

Results 10 second stair step challenge The reference platform performance numbers would cluster with occasional outliers. This was most pronounced in the breadth first runs making me suspect higher sensitivity to something network related AgentTeamwork is competitive

Results

Results

Framework Function Evaluation Framework Issues Agent Teamwork MPI implementation MPICH-G2 C++ MPIJava MPI Framework Communication functions Initialization, Barrier, Broadcast, Gather, Scatter, etc. Goal to write 3 benchmark programs that have communication intensive algorithms. The second part of my effort was to evaluate the framework performance C++ vs java The Agent Teamwork framework provides an MPI implementation. However Agent Teamwork is written entirely in java whereas MPICH-G2 is a c++ framework. To avoid comparing frameworks in different languages I evaluated Agent Teamwork against the MPIJava framework which is a popular java MPI implementation

Benchmark Programs MD - a molecular dynamics simulation Wave2D - a wave dissemination simulation Mandelbrot - a Mandelbrot generator Code each program twice Except for Mandelbrot which one the professor’s students had already coded in mpijava

Agent Teamwork Programming Snapshots Programming model func_n int func_0 (String[] Args){ … return 1; } int func_1 () { Code Maturity Agent Teamwork takes regular runtime snapshots of a program and is capable of migrating a running job from one node to another for load balancing and dynamic failure recovery. Java won’t serialize program counter and stack race conditions resulting in deadlocks and other framework bugs prevented completing framework evaluation

Partial Results

Partial Results 2 orders of magnitude slower – suspect related to snapshots and size of data set

Future Work Framework debugging Develop a pre-processor to convert conventionally programmed code into the snapshot-able func_n model

Skills Developed During Project Significant experience with globus, openPBS and the mpi Extensive debugging with tcpdump, strace, and gdb experience with performance analysis and writing MPI programs new insights and understanding of HPDC