5/25/2006CSS Speaker Series1 Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents Munehiro Fukuda Computing & Software Systems, University.

Slides:



Advertisements
Similar presentations
Three types of remote process invocation
Advertisements

UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Mobile Agents Mouse House Creative Technologies Mike OBrien.
Operating System.
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Silberschatz and Galvin  Operating System Concepts Module 16: Distributed-System Structures Network-Operating Systems Distributed-Operating.
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
A Computation Management Agent for Multi-Institutional Grids
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.
Processes CSCI 444/544 Operating Systems Fall 2008.
MPICH-V: Fault Tolerant MPI Rachit Chawla. Outline  Introduction  Objectives  Architecture  Performance  Conclusion.
CSS434 Grid Computing1 Textbook No Corresponding Chapters Professor: Munehiro Fukuda A portion of these slides were compiled from The Grid: Blueprint for.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Implementation of XML Database and Enhancement of Resource and Sensor Agents Cuong Ngo CSS497 Summer 2006 Professor Munehiro Fukuda.
AgentOS: The Agent-based Distributed Operating System for Mobile Networks Salimol Thomas Department of Computer Science Illinois Institute of Technology,
Company LOGO Development of Resource/Commander Agents For AgentTeamwork Grid Computing Middleware Funded By Prepared By Enoch Mak Spring 2005.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Inter-cluster Job Deployment by AgentTeamwork Sentinel Agents Emory Horvath CSS497 Spring 2006 Advisor: Dr. Munehiro Fukuda.
Message Passing Interface In Java for AgentTeamwork (MPJ) By Zhiji Huang Advisor: Professor Munehiro Fukuda 2005.
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
Distributed Process Implementation Hima Mandava. OUTLINE Logical Model Of Local And Remote Processes Application scenarios Remote Service Remote Execution.
Distributed Process Implementation
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
Research Achievements Kenji Kaneda. Agenda Research background and goal Research background and goal Overview of my research achievements Overview of.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
Cloud computing for internet emulator. Professor Muthucumaru Maheswaran Team Members Mia Hochar Simon Foucher David El Achkar David El Achkar Marc Atie.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
G-JavaMPI: A Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports Lin Chen, Cho-Li Wang, Francis C. M. Lau and.
MapReduce How to painlessly process terabytes of data.
Chapter 5.4 DISTRIBUTED PROCESS IMPLEMENTAION Prepared by: Karthik V Puttaparthi
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Transparent Mobility of Distributed Objects using.NET Cristóbal Costa, Nour Ali, Carlos Millan, Jose A. Carsí 4th International Conference in Central Europe.
More on Adaptivity in Grids Sathish S. Vadhiyar Source/Credits: Figures from the referenced papers.
1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
8/25/2005IEEE PacRim The Design Concept and Initial Implementation of AgentTeamwork Grid Computing Middleware Munehiro Fukuda Computing & Software.
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
CSS497 Undergraduate Research Performance Comparison Among Agent Teamwork, Globus and Condor By Timothy Chuang Advisor: Professor Munehiro Fukuda.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
8/25/2005IEEE PacRim The Check-Pointed and Error-Recoverable MPI Java of AgentTeamwork Grid Computing Middleware Munehiro Fukuda and Zhiji Huang.
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Agent Teamwork Research Assistant
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
ECRG High-Performance Computing Seminar
CRESCO Project: Salvatore Raia
Class project by Piyush Ranjan Satapathy & Van Lepham
湖南大学-信息科学与工程学院-计算机与科学系
CSS490 Grid Computing Textbook No Corresponding Chapter
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
AGENT OS.
Basic Grid Projects – Condor (Part I)
Atlas: An Infrastructure for Global Computing
Presentation transcript:

5/25/2006CSS Speaker Series1 Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents Munehiro Fukuda Computing & Software Systems, University of Washington, Bothell Funded by

5/25/2006 CSS Speaker Series 2 Outline 1.Introduction 2.Execution Model 3.System Design 4.Performance Evaluation 5.Related Work 6.Conclusions

5/25/2006 CSS Speaker Series 3 1. Introduction Problems in Grid Computing Background of Mobile Agents Objective Project Overview

5/25/2006 CSS Speaker Series 4 Quiet Laboratories UW1-320 and UW1-302 at 3pm on a weekday No more computing resources needed?

5/25/2006 CSS Speaker Series 5 Demands for Computing Resources In Teaching I'm in the 320 lab testing my program, and for some reason, whenever I attempt to use 15 hosts, it asks me for passwords for hosts and then freezes and does nothing. I noticed that uw is being bogged down by zombie processes that someone left going on it: I went around looking on some of the other computers, and its all over the place: user A has almost 40 processes running on uw since April 23rd, user B has about 10 on host 29, and there’s a ton more on almost every host. I got tired of manually running a bunch of ssh commands to run rmi on many different machines. I have narrowed the problem down to three machines: 16, 20, and 30. First of all uw is dead. It drops all incoming ssh connections. The other two, uw and uw , both have a mysterious problem that I don't know how to solve.

5/25/2006 CSS Speaker Series 6 Demands for Computing Resources In Research These are an effective way to collect numerous computing resources from all over the world. But, here is a question: Why don’t they use idle machines on their campuses first?

5/25/2006 CSS Speaker Series 7 Grid-Computing Brokers Desktops  Buyers: a desktop user  Sellers: hardware components  Brokers: Windows, Linux Clusters  Buyers: multiple users (e.g., CSS434 students)  Sellers: cluster computing nodes  Brokers: PBS, LSF Grid computing  Buyers: etc.  Brokers: Globus, Condor, Legion (Avaki), NetSolve, Ninf, Entropia, etc. Okay, no need to implement any more?

5/25/2006 CSS Speaker Series 8 Problems in Grid Computing Targeting large business models A central entry point A lot of installation work Little system faults Too gigantic

5/25/2006 CSS Speaker Series 9 Our Target Network Targeting a group of computer users No central entry point  No central managers  No programming model restrictions Easy installation work Easy participation but necessity of fault tolerance

5/25/2006 CSS Speaker Series 10 Background of Mobile Agents Internet Central manger Server FTP HTTP RPC Cycle User An execution model previously highlighted as a prospective infrastructure of distributed systems. Static job deployment and result collection: No more than an alternative approach to centralized grid middleware implementation Our goal: Let mobile agents do unique tasks in grid computing

5/25/2006 CSS Speaker Series 11 Objective Focus on a group of independent computers  Turned on and off independently  Not controlled by a scheduler such as PBS and LSF  Not managed by a central server Let mobile agents do unique tasks in grid computing  Runtime job migration: Moving a program from a faulty/busy site to an active/idle site Seeking for fault tolerance and better load balancing  Negotiation: Negotiating with other agents about computing resources Seeking for better load balancing  Inherent parallelism: Deploying and monitoring jobs in parallel Decentralized job management

5/25/2006 CSS Speaker Series 12 Project Overview Funded by:NSF Middleware Initiative Sponsored by:University of Washington In Collaboration of:Ehime University In a Team of:UWB Undergraduates

5/25/2006 CSS Speaker Series Execution Model System Overview Execution Layer Programming Environment

5/25/2006 CSS Speaker Series 14 System Overview FTP Server User A User B User B snapshot snapshots User program wrapper Snapshot Methods GridTCP User program wrapper Snapshot Methods GridTCP User program wrapper Snapshot Methods GridTCP snapshot User A’s Process User A’s Process User B’s Process TCP Communication Commander Agent Sentinel Agent Resource Agent Sentinel Agent Resource Agent Bookkeeper Agent Results

5/25/2006 CSS Speaker Series 15 Execution Layer Operating systems UWAgents mobile agent execution platform Commander, resource, sentinel, and bookkeeper agents User program wrapper GridTcpJava socket mpiJava-AmpiJava-S mpiJava API Java user applications

5/25/2006 CSS Speaker Series 16 MPI Java Programming public class MyApplication { public GridIpEntry ipEntry[]; // used by the GridTcp socket library public int funcId; // used by the user program wrapper public GridTcp tcp; // the GridTcp error-recoverable socket public int nprocess; // #processors public int myRank; // processor id ( or mpi rank) public int func_0( String args[] ) { // constructor MPJ.Init( args, ipEntry, tcp ); // invoke mpiJava-A.....; // more statements to be inserted return 1; // calls func_1( ) } public int func_1( ) { // called from func_0 if ( MPJ.COMM_WORLD.Rank( ) == 0 ) MPJ.COMM_WORLD.Send(... ); else MPJ.COMM_WORLD.Recv(... );.....; // more statements to be inserted return 2; // calls func_2( ) } public int func_2( ) { // called from func_2, the last function.....; // more statements to be inserted MPJ.finalize( ); // stops mpiJava-A return -2; // application terminated }

5/25/2006 CSS Speaker Series System Design Mobile Agents Job Coordination  Distribution  Resource allocation and monitoring  Resumption and migration Programming Support  Language preprocessing  Communication check-pointing Inter-Cluster Job Deployment (Current Research Topic)  Over-gateway agent migration  Over-gateway communication  Job distribution

5/25/2006 CSS Speaker Series 18 id 0 Agent domain (time=3:31pm, 8/25/05 ip = perseus.uwb.edu name = fukuda) id 0 UWInject: submits a new agent from shell. Agent domain (time=3:30pm, 8/25/05 ip = medusa.uwb.edu name = fukuda) UWAgents – Concept of Agent Domain User id 1id 2id 3 id 7id 6id 5id 4id 11id 10id 9id 8 id 12 -m 4 id 1 id 2 -m 3 UWPlace A user job

5/25/2006 CSS Speaker Series 19 Job Distribution User Commander id 0 Sentinel id 2 rank 0 Bookkeeper id 3 rank 0 Resource id 1 eXist Sentinel id 8 rank 1 Sentinel id 11 rank 4 Sentinel id 10 rank 3 Sentinel id 9 rank 2 Bookkeeper id 12 rank 1 Bookkeeper id 15 rank 4 Bookkeeper id 14 rank 3 Bookkeeper id 13 rank 2 Sentinel id 32 rank 5 Sentinel id 34 rank 7 Sentinel id 33 rank 6 Bookkeeper id 48 rank 5 Bookkeeper id 50 rank 7 Bookkeeper id 49 rank 6 Job Submission XML Query Spawn id: agent id rank: MPI Rank snapshot Sensor id 4 Sensor id 5

5/25/2006 CSS Speaker Series 20 Resource Allocation and Monitoring Node 1Node 0Node 2 User Commander id 0 Resource id 1 eXist Job submission An XML query CPU Architecture OS Memory Disk Total nodes Multiplier total nodes x multiplier A list of available nodes Spawn Sentinel id 2 rank 0 Bookkeeper id 2 rank 0 Node5Node 4Node 3 Sentinel id 8 rank 1 Bookkeeper id 12 rank 5 Sentinel id 2 rank 0 Sentinel id 8 rank 1 Bookkeeper id 2 rank 0 Bookkeeper id 12 rank 5 Case 1: Total nodes = 2 Multiplier = 1.5 Case 2: Total nodes = 2 Multiplier = 3 Future use Sensor id 4 Sensor id 5 Sensor id 16 Sensor id 18 Sensor id 17 Sensor id 19 Sensor id 20 Sensor id 22 Sensor id 21 Sensor id 23 ttcp Performance data ttcp Our own XML DB

5/25/2006 CSS Speaker Series 21 Job Resumption by a Parent Sentinel Sentinel id 2 rank 0 Sentinel id 8 rank 1 Sentinel id 11 rank 4 Sentinel id 10 rank 3 Sentinel id 9 rank 2 Bookkeeper id 15 rank 4 (0) Send a new snapshot periodically MPI connections (2) Search for the latest snapshot (1) Detect a ping error Sentinel id 11 rank 4 New (4) Send a new agent (5) Restart a user program (3) Retrieve the snapshot

5/25/2006 CSS Speaker Series 22 Job Resumption by a Child Sentinel Commander id 0 Sentinel id 2 rank 0 Bookkeeper id 3 rank 0 Sentinel id 8 rank 1 Bookkeeper id 12 rank 1 Resource id 1 (1) No pings for 8 * 5 (= 40sec) No pings for 12 * 5 (= 60sec) (2) Search for the latest snapshot (3) Search for the latest snapshot(4) Retrieve the snapshot New Sentinel id 2 rank 0 (5) Send a new agent (7) Search for the latest snapshot (8) Search for the latest snapshot (9) Retrieve the snapshot (11) Detect a ping error (13) Detect a ping error and follow the same child resumption procedure as in p9. Commander id 0 (10) Send a new agent (6) No pings for 2 * 5 (= 10sec) (12) Restart a new resource agent from its beginning Resource id 1 New

5/25/2006 CSS Speaker Series 23 User Program Wrapper statement_1; statement_2; statement_3; check_point( ); statement_4; statement_5; statement_6; check_point( ); statement_7; statement_8; statement_9; check_point( ); int fid = 1; while( fid == -2) { switch( func_id ) { case 0: fid = func_0( ); case 1: fid = func_1( ); case 2: fid = func_2( ); } check_point( ) { // save this object // including func_id // into a file } func_0( ) { statement_1; statement_2; statement_3; return 1; } func_1( ) { statement_4; statement_5; statement_6; return 2; } func_2( ) { statement_7; statement_8; statement_9; return -2; } User Program Wrapper Source Code Preprocessed Cryptography

5/25/2006 CSS Speaker Series 24 Pre-proccesser and Drawback No recursions Useless source line numbers indicated upon errors Still need of explicit snapshot points. statement_1; statement_2; statement_3; check_point( ); while (…) { statement_4; if (…) { statement_5; check_point( ); statement_6; } else statement_7; statement_8; } check_point( ); int func_0( ) { statement_1; statement_2; statement_3; return 1; } int func_1( ) { while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement_8; } int func_2( ) { statement_6; statement_8; while(…) { statement_4; if (…) { statement_5; return 2; } else statement_7; statement8; } Source Code Preprocessed Code Before check_point( ) in if-clause After check_point( ) in if-clause Preprocessed

5/25/2006 CSS Speaker Series 25 GridTcp – Check-Pointed Connection n1.uwb.edu n3.uwb.edu n2.uwb.edu TCP user program rankip 1n1.uwb.edu 2n2.uwb.edu outgoing backup incoming User Program Wrapper Snapshot maintenance TCP user program n2.uwb.edu2 n1.uwb.edu1 iprank incoming ougoing backup User Program Wrapper n3.uwb.edu user program n3.uwb.edu2 n1.uwb.edu1 iprank incoming ougoing backup User Program Wrapper TCP Outgoing packets saved in a backup queue All packets serialized in a backup file every check pointing Upon a migration  Packets de-serialized from a backup file  Backup packets restored in outgoing queue  IP table updated

5/25/2006 CSS Speaker Series 26 Inter-Cluster Job Deployment Current Research Topic Over-gateway agent deployment Over-gateway TCP communication Over-gateway agent tree creatioin medusa.uwb.edu uw uwb.eduuw uwb.edu Internet Private domain Commander id 0 Sentinel id 2 Sentinel id 8 Sentinel id 9 How?

5/25/2006 CSS Speaker Series 27 mnode0 medusa.uwb.edu uw uwb.eduuw uwb.edu mnode1mnode4 Internet Private domain id 0id 1 UWAgents – Over Gateway Migration id 1 spawnChild( ) hop( ) talk( ) Parent and children keep track of a route to each other’s current position. A daemon maintains where a gateway is.

5/25/2006 CSS Speaker Series 28 mnode0 medusa.uwb.edu uw uwb.eduuw uwb.edu mnode1mnode4 Internet Private domain GridTcp – Over-Gateway Connection Commander id 0 Sentinel id 2 rank 0 Sentinel id 8 rank 1 Sentinel id 9 rank 2 user program User Program Wrapper -medusa2 -mnode01 medusauw gatewaydestrank user program User Program Wrapper -medusa2 mnode01 -uw gatewaydestrank user program User Program Wrapper -medusa2 -mnode01 -uw gatewaydestrank

5/25/2006 CSS Speaker Series 29 Partition 2 Over-Gateway Agent Tree Creation Possible Solutions User Commander id 0 Sentinel id 2 rank 0 Sentinel id 8 rank 1 Sentinel id 11 rank 4 Sentinel id 10 rank 3 Sentinel id 9 rank 2 Sentinel id 32 rank 5 Sentinel id 34 rank 7 Sentinel id 33 rank 6 Sentinel id 35 rank 8 Sentinel id 46 rank 19 Sentinel id 47 rank 20 Bookkeeper id 3 rank 0 Resource id 1 Cluster 0 Cluster 1 Cluster 2 Partition 1

5/25/2006 CSS Speaker Series 30 Sentinel id 531 rank 10 Sentinel id 131 rank 4 Over-Gateway Agent Tree Creation Final Solution User Commander id 0 Sentinel id 2 Sentinel id 8 rank -8 Sentinel id 33 rank -33 Sentinel id 32 rank 0 Sentinel id 9 rank X Sentinel id 130 rank 3 Sentinel id 129 rank 2 Sentinel id 132 rank 6 Sentinel id 34 rank -34 Bookkeeper id 3 rank 0 Resource id 1 Sentinel id 512 rank 5 Sentinel id 530 rank 9 Sentinel id 529 rank 8 Sentinel id 35 rank -35 Sentinel id 39 rank X+4 Sentinel id 128 rank 1 Sentinel id 38 rank X+3 Sentinel id 37 rank X+2 Sentinel id 36 rank X+1 Sentinel id 528 rank 7 Cluster 0 Cluster 1 Cluster 2 Cluster 3 Cluster gateway 0 Cluster gateways 1, 2, and 3 Desktop computers

5/25/2006 CSS Speaker Series Performance Evaluation Evaluation Environment:  A 8-node Myrinet-2000 cluster: 2.8GHz pentium4-Xeon w/ 512MB  A 24-node Giga-Ethernet cluster: 3.4GHz Pentium4-Xeon w/512MB Computation Granularity Java Grande MPJ Benchmark Process Resumption Overhead File Transfer

5/25/2006 CSS Speaker Series 32 Computational Granularity 1 Master Slave Communication Master-slave computation

5/25/2006 CSS Speaker Series 33 Computational Granularity 2 Process Communication Heartbeat communication

5/25/2006 CSS Speaker Series 34 Computational Granularity 3 Process Communication All to all broadcast

5/25/2006 CSS Speaker Series 35 Performance Evaluation - Series Master-slave computation

5/25/2006 CSS Speaker Series 36 Performance Evaluation - RayTracer All reduce communication but few data to send

5/25/2006 CSS Speaker Series 37 Performance Evaluation – MolDyn All to all broadcast

5/25/2006 CSS Speaker Series 38 Overhead of Job Resumption

5/25/2006 CSS Speaker Series 39 User Commander id 0 Sentinel id 2 rank 0 Sentinel id 8 rank 1 Sentinel id 11 rank 4 Sentinel id 10 rank 3 Sentinel id 9 rank 2 Sentinel id 32 rank 5 Sentinel id 34 rank 7 Sentinel id 33 rank 6 Sentinel id 35 rank 8 Sentinel id 46 rank 19 Sentinel id 47 rank 20 Bookkeeper id 3 rank 0 Resource id 1 AgentTeamwork vs NFSPipelined Transfer in AgentTeamwork File Transfer

5/25/2006 CSS Speaker Series Related Work From the viewpoints of: System Architecture Fault Tolerance Job Deployment and Monitoring

5/25/2006 CSS Speaker Series 41 System Architecture SystemsArchitectural basis GlobusA toolkit CondorProcess migration Ninf, NetSolveRPC Legion (Avaki)OO Catalina, J-SEAL2, AgentTeamworkMobile agents Difference from Catalina/J-SEAL2  They are not fully implemented.  They are based on a master-slave model

5/25/2006 CSS Speaker Series 42 Fault Tolerance SystemsLibrariesData recoveryCommunication recovery Legion (Avaki)FT-MPIVariables passed to MPI_FT_save( ) Links recovered CondorMW LibraryAll master dataMaster-worker communication DomeDome_envObjects declared as dXXX N/A AgentTeamworkGridTcpAll serializable class data All in-transit messages

5/25/2006 CSS Speaker Series 43 Job Deployment and Monitoring SystemsCo-Allocation Module Deployment Scheme GlobusDUROCMaster slave CondorGrid ManagerMater slave LegionScheduler and Enactor Master slave AgenTeamworkSentinel agentsHierarchical

5/25/2006 CSS Speaker Series Conclusions Project Summary Next Two Years

5/25/2006 CSS Speaker Series 45 Project summary Applications  Computation granularity: 40,000 doubles x 10,000 floating-point operations  Message transfer: Any types except all-to-all communication  Entire application size: 3+ times larger than computation granularity Current status  UWAgent: completed  Agent behavioral design: basic job deployment/resumption implemented  User program wrapper: completed including security features  GridTcp/mpiJava: in testing  Preprocessor: almost completed

5/25/2006 CSS Speaker Series 46 Next Two Years Application support  Fault tolerance in file transfer  GUI improvement Agent algorithms  Over-gateway application deployment  Dynamic resource allocation and monitoring  Priority-based agent migration Performance evaluation Dissemination

5/25/2006 CSS Speaker Series 47 Can AgentTeamwork Become Their Competitor? AgentTeamwo rk Nimrod

5/25/2006 CSS Speaker Series 48 Questions?

5/25/2006 CSS Speaker Series 49 MPJ.Send and Recv Performance

5/25/2006 CSS Speaker Series 50 Mobile Agents Mobile agents NamingCascading termination Job scheduling Security IBM Aglets AgeltFinder traces all agents Needs to retract one by one Schedules jobs with Baglets. Java byte-code verification Voyager RPC-based system- unique agent IDs Needs to be implemented at a user level Launches an independent user process. CORBA security service D’Agent Unpredictable agent IDs Needs to be implemented at a user level Launches an independent user process. A currency-based model Ara (Obsolete) Unpredictable agent IDs Calls ara_kill to kill all agents Launches an independent user process. An allowance model UWAgent Agent domainWaits for all descendants’ termination Schedules jobs with Java thread functions. Agent-to-agent security w/ Agent domain