8/25/2005IEEE PacRim 20051 The Design Concept and Initial Implementation of AgentTeamwork Grid Computing Middleware Munehiro Fukuda Computing & Software.

Slides:



Advertisements
Similar presentations
Max Mustermann Folientitel Veranstaltung Online Steering of HEP Applications Daniel Lorenz University of Siegen Cracow Grid Workshop –
Advertisements

Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.
Interaction model of grid services in mobile grid environment Ladislav Pesicka University of West Bohemia.
3.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Process An operating system executes a variety of programs: Batch system.
Operating System.
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
12/20/2005AgentTeamwork1 AgentTeamwork: Mobile-Agent-Based Middleware for Distributed Job Coordination Munehiro Fukuda Computing & Software Systems, University.
Distributed systems Programming with threads. Reviews on OS concepts Each process occupies a single address space.
The Organic Grid: Self- Organizing Computation on a Peer-to-Peer Network Presented by : Xuan Lin.
MPICH-V: Fault Tolerant MPI Rachit Chawla. Outline  Introduction  Objectives  Architecture  Performance  Conclusion.
CSS434 Grid Computing1 Textbook No Corresponding Chapters Professor: Munehiro Fukuda A portion of these slides were compiled from The Grid: Blueprint for.
Peer to Peer Overlay Network for Sensor net Eng. Husam Alzaq Computer Engineering Department Islamic University of Gaza 1.
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Implementation of XML Database and Enhancement of Resource and Sensor Agents Cuong Ngo CSS497 Summer 2006 Professor Munehiro Fukuda.
Networking Support In Java Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
5/25/2006CSS Speaker Series1 Parallel Job Deployment and Monitoring in a Hierarchy of Mobile Agents Munehiro Fukuda Computing & Software Systems, University.
Company LOGO Development of Resource/Commander Agents For AgentTeamwork Grid Computing Middleware Funded By Prepared By Enoch Mak Spring 2005.
Inter-cluster Job Deployment by AgentTeamwork Sentinel Agents Emory Horvath CSS497 Spring 2006 Advisor: Dr. Munehiro Fukuda.
Diffusion scheduling in multiagent computing system MotivationArchitectureAlgorithmsExamplesDynamics Robert Schaefer, AGH University of Science and Technology,
Message Passing Interface In Java for AgentTeamwork (MPJ) By Zhiji Huang Advisor: Professor Munehiro Fukuda 2005.
A Progressive Fault Tolerant Mechanism in Mobile Agent Systems Michael R. Lyu and Tsz Yeung Wong July 27, 2003 SCI Conference Computer Science Department.
DISTRIBUTED PROCESS IMPLEMENTAION BHAVIN KANSARA.
Distributed Process Implementation Hima Mandava. OUTLINE Logical Model Of Local And Remote Processes Application scenarios Remote Service Remote Execution.
Distributed Process Implementation
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
RUNNING PARALLEL APPLICATIONS BEYOND EP WORKLOADS IN DISTRIBUTED COMPUTING ENVIRONMENTS Zholudev Yury.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
Remote Access Chapter 4. Learning Objectives Understand implications of IEEE 802.1x and how it is used Understand VPN technology and its uses for securing.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
Chapter 5.4 DISTRIBUTED PROCESS IMPLEMENTAION Prepared by: Karthik V Puttaparthi
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
Transparent Mobility of Distributed Objects using.NET Cristóbal Costa, Nour Ali, Carlos Millan, Jose A. Carsí 4th International Conference in Central Europe.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
Distributed System Concepts and Architectures Services
Distributed System Services Fall 2008 Siva Josyula
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Introduction to Grid Computing and its components.
1 Distributed Processing Chapter 1 : Introduction.
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000.
CSS497 Undergraduate Research Performance Comparison Among Agent Teamwork, Globus and Condor By Timothy Chuang Advisor: Professor Munehiro Fukuda.
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
8/25/2005IEEE PacRim The Check-Pointed and Error-Recoverable MPI Java of AgentTeamwork Grid Computing Middleware Munehiro Fukuda and Zhiji Huang.
MSF and MAGE: e-Science Middleware for BT Applications Sep 21, 2006 Jaeyoung Choi Soongsil University, Seoul Korea
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Agent Teamwork Research Assistant
E-Storm: Replication-based State Management in Distributed Stream Processing Systems Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein.
Introduction to Distributed Platforms
Hierarchical Architecture
NGS computation services: APIs and Parallel Jobs
University of Technology
KERNEL ARCHITECTURE.
Class project by Piyush Ranjan Satapathy & Van Lepham
CSS490 Grid Computing Textbook No Corresponding Chapter
Atlas: An Infrastructure for Global Computing
An XML-based System Architecture for IXA/IA Intercommunication
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

8/25/2005IEEE PacRim The Design Concept and Initial Implementation of AgentTeamwork Grid Computing Middleware Munehiro Fukuda Computing & Software Systems, University of Washington, Bothell Koichi Kashiwagi Shinya Kobayashi Computer Science, Ehime University Funded by

8/25/2005 IEEE PacRim Background Most grid-computing systems  Centralized resource/job management  Two drawbacks A powerful central server essential to manage all slave computing nodes Applications based on master-slave or parameter-sweep model Mobile agents  An execution model previously highlighted as a prospective infrastructure of distributed systems.  No more than an alternative approach to centralized grid middleware implementation. Our motivation  Decentralized job distribution and coordination  Decentralized fault tolerance  Applications based on a variety of communication models

8/25/2005 IEEE PacRim Objective A mobile agent execution platform fitted to grid computing  Allowing an agent to identify which MPI rank to handle and which agent to send a job snapshot to. A fault-tolerant inter-process communication  Recovering lost messages.  Allowing over-gateway connections. Agent-collaborative algorithms for job coordination  Allocating computing nodes in a distributed manner.  Implementing decentralized snapshot maintenance and job recovery.

8/25/2005 IEEE PacRim System Overview FTP Server User A User B User B snapshot snapshots User program wrapper Snapshot Methods GridTCP User program wrapper Snapshot Methods GridTCP User program wrapper Snapshot Methods GridTCP snapshot User A’s Process User A’s Process User B’s Process TCP Communication Commander Agent Sentinel Agent Resource Agent Sentinel Agent Resource Agent Bookkeeper Agent Results

8/25/2005 IEEE PacRim Execution Layer Operating systems UWAgents mobile agent execution platform Commander, resource, sentinel, and bookkeeper agents User program wrapper GridTcpJava socket mpiJava-AmpiJava-S mpiJava API Java user applications UWAgents mobile agent execution platform Commander, resource, sentinel, and bookkeeper agents

8/25/2005 IEEE PacRim id 0 Agent domain (time=3:31pm, 8/25/05 ip = perseus.uwb.edu name = fukuda) id 0 UWInject: submits a new agent from shell. Agent domain (time=3:30pm, 8/25/05 ip = medusa.uwb.edu name = fukuda) UWAgents Execution Platform Agent domain created per each submission from the Unix shell # children each agent can spawn is given upon the initial submission No name server Messages forwarded through an agent tree A user job scheduled as a thread, using suspend/resume User id 1id 2id 3 id 7id 6id 5id 4id 11id 10id 9id 8 id 12 -m 4 id 1 id 2 -m 3 UWPlace A user job

8/25/2005 IEEE PacRim Job Distribution User Commander id 0 Sentinel id 2 rank 0 Bookkeeper id 3 rank 0 Resource id 1 eXist Sentinel id 8 rank 1 Sentinel id 11 rank 4 Sentinel id 10 rank 3 Sentinel id 9 rank 2 Bookkeeper id 12 rank 1 Bookkeeper id 15 rank 4 Bookkeeper id 14 rank 3 Bookkeeper id 13 rank 2 Sentinel id 32 rank 5 Sentinel id 34 rank 7 Sentinel id 33 rank 6 Bookkeeper id 48 rank 5 Bookkeeper id 50 rank 7 Bookkeeper id 49 rank 6 Job Submission XML Query Spawn id: agent id rank: MPI Rank snapshot

8/25/2005 IEEE PacRim Resource Allocation Node 1Node 0Node 2 User Commander id 0 Resource id 1 eXist Job submission An XML query CPU Architecture OS Memory Disk Total nodes Multiplier total nodes x multiplier A list of available nodes Spawn Sentinel id 2 rank 0 Bookkeeper id 2 rank 0 Node 1Node 0Node5Node 4Node 3Node 2 Sentinel id 8 rank 1 Bookkeeper id 12 rank 5 Sentinel id 2 rank 0 Sentinel id 8 rank 1 Bookkeeper id 2 rank 0 Bookkeeper id 12 rank 5 Case 1: Total nodes = 2 Multiplier = 1.5 Case 2: Total nodes = 2 Multiplier = 3 Future use

8/25/2005 IEEE PacRim Job Resumption by a Parent Sentinel Sentinel id 2 rank 0 Sentinel id 8 rank 1 Sentinel id 11 rank 4 Sentinel id 10 rank 3 Sentinel id 9 rank 2 Bookkeeper id 15 rank 4 (0) Send a new snapshot periodically MPI connections (2) Search for the latest snapshot (1) Detect a ping error Sentinel id 11 rank 4 New (4) Send a new agent (5) Restart a user program (3) Retrieve the snapshot

8/25/2005 IEEE PacRim Job Resumption by a Child Sentinel Commander id 0 Sentinel id 2 rank 0 Bookkeeper id 3 rank 0 Sentinel id 8 rank 1 Bookkeeper id 12 rank 1 Resource id 1 (1) No pings for 8 * 5 (= 40sec) No pings for 12 * 5 (= 60sec) (2) Search for the latest snapshot (3) Search for the latest snapshot(4) Retrieve the snapshot New Sentinel id 2 rank 0 (5) Send a new agent (7) Search for the latest snapshot (8) Search for the latest snapshot (9) Retrieve the snapshot (11) Detect a ping error (13) Detect a ping error and follow the same child resumption procedure as in p9. Commander id 0 (10) Send a new agent (6) No pings for 2 * 5 (= 10sec) (12) Restart a new resource agent from its beginning Resource id 1 New

8/25/2005 IEEE PacRim Computational Granularity 1

8/25/2005 IEEE PacRim Computational Granularity 2

8/25/2005 IEEE PacRim Computational Granularity 3

8/25/2005 IEEE PacRim Performance Evaluation - Series

8/25/2005 IEEE PacRim Performance Evaluation - RayTracer

8/25/2005 IEEE PacRim Performance Evaluation – MolDyn

8/25/2005 IEEE PacRim Overhead of Job Resumption

8/25/2005 IEEE PacRim Conclusions Our focus  A decentralized job execution and fault-tolerant environment  Applications not restricted to the master-slave or parameter- sweeping model. Applications  40,000 doubles x 10,000 floating-point operations  Moderate data transfer combined with massive/collective communication  At least three times larger than its computational granularity Future work  UWAgents enhancement: over-gateway deployment and security  Programming support: preprocessor implementation  Job scheduling algorithms: priority-based agent migration