PTools Annual Meeting, Knoxville, TN, 10-12 September 2002 The Tool Daemon Protocol: Defining the Interface Between Tools and Process Management Systems.

Slides:



Advertisements
Similar presentations
MPI Message Queue Debugging Interface Chris Gottbrath Director, Product Management.
Advertisements

WS-JDML: A Web Service Interface for Job Submission and Monitoring Stephen M C Gough William Lee London e-Science Centre Department of Computing, Imperial.
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Distributed System Structures Network Operating Systems –provide an environment where users can access remote resources through remote login or file transfer.
QNX® real-time operating system
Chapter 6 Security Kernels.
Chap 2 System Structures.
Operating-System Structures
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Condor Project Computer Sciences Department University of Wisconsin-Madison Stork An Introduction Condor Week 2006 Milan.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
Computer Systems/Operating Systems - Class 8
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Task 3.5 Tests and Integration ( Wp3 kick-off meeting, Poznan, 29 th -30 th January 2002 Santiago González de la.
3.5 Interprocess Communication
Ceng Operating Systems Chapter 2.1 : Processes Process concept Process scheduling Interprocess communication Deadlocks Threads.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Grid IO APIs William Gropp Mathematics and Computer Science Division.
Asynchronous Web Services Approach Enrique de Andrés Saiz.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Simship.com LRC, September 22, 2004 Dr. Stephen Flinter Connect Global Solutions.
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Computing I CONDOR.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Miron Livny Computer Sciences Department University of Wisconsin-Madison Welcome and Condor Project Overview.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Hao Wang Computer Sciences Department University of Wisconsin-Madison Authentication and Authorization.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
1 Chapter 2.1 : Processes Process concept Process concept Process scheduling Process scheduling Interprocess communication Interprocess communication Threads.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
Workshop on Future Learning Landscapes: Towards the Convergence of Pervasive and Contextual computing, Global Social Media and Semantic Web in Technology.
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
ABone Architecture and Operation ABCd — ABone Control Daemon Server for remote EE management On-demand EE initiation and termination Automatic EE restart.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
© 2002 Barton P. MillerMarch 4, 2001Tool Dæmon Protocol The Tool Dæmon Protocol: Using Monitoring Tools on Remote Applications Barton P. Miller
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
April 14, 2004 The Distributed Performance Consultant: Automated Performance Diagnosis on 1000s of Processors Philip C. Roth Computer.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Newsgroup World Wide Web (WWW) Conservation Over the Internet e.g.ICQ File Transfer Protocol (FTP) Includes 6 main services: Electronic Mail Remote.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
CSI 3125, Preliminaries, page 1 Networking. CSI 3125, Preliminaries, page 2 Networking A network represents interconnection of computers that is capable.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.
Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
© 2001 Week (14 March 2001)Paradyn & Dyninst Demonstrations Paradyn & Dyninst Demos Barton P. Miller Computer.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
CSC414 “Introduction to UNIX/ Linux” Lecture 3
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor NT Condor ported.
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
Emulating Volunteer Computing Scheduling Policies Dr. David P. Anderson University of California, Berkeley May 20, 2011.
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Presentation transcript:

PTools Annual Meeting, Knoxville, TN, September 2002 The Tool Daemon Protocol: Defining the Interface Between Tools and Process Management Systems Paradyn Group Condor Group { Computer Sciences Department University of Wisconsin Madison, Wisconsin USA Ana Cortés Miquel A. Senar Departament d’Informàtica Universitat Autònoma de Barcelona Barcelona, Spain Presented by Philip C. Roth

2Tool Daemon Protocol The Current Situation Consider a job submitted to a process management system (e.g., Condor, PBS, Globus, MPICH’s MPD)—the process manager… …starts the job’s processes Sets up file I/O Sets up standard I/O …monitors process status …controls the job Process Manager Daemon monitor/ control Application Process Application Process

3Tool Daemon Protocol The Current Situation Next, consider a tool wanting to monitor the job. The tool… …also may want to start the processes (or attach to them) …also needs to monitors process status …also may want to control the job …also may want access to file I/O or standard I/O …needs to communicate with its front-end Process Manager Daemon monitor/ control Application Process Application Process Tool Daemon ? ?

4Tool Daemon Protocol The Current Situation So, who wins? Process Manager Daemon monitor/ control Application Process Application Process Tool Daemon ? ?

5Tool Daemon Protocol The Current Situation Process managers are many and varied E.g., IBM POE, SGI Origin MPI and MPICH all work differently Some process managers have support for specific tools E.g., MPICH support for TotalView debugger Heading for an m  n combination of m process managers and n tools Bottom line: need a standard interface for process managers and tools to coexist: The Tool Daemon Protocol (TDP)

6Tool Daemon Protocol TDP: The Tool Daemon Protocol Defines an API between process management system and tool processes for… 1.Creating processes 2.Controlling processes 3.Sharing information between processes Pilot implementation—trying out ideas to see what works

7Tool Daemon Protocol TDP Job Startup Sequence Execution Host Local Host Tool Front-End Process Manager Daemon Create job 1.Tool submits job request to process management system

8Tool Daemon Protocol TDP Job Startup Sequence Application Process Execution Host Local Host Tool Front-End Process Manager Daemon 2.Process manager creates application processes, leaving it suspended (“pause on exec”)

9Tool Daemon Protocol TDP Job Startup Sequence Application Process Execution Host Local Host Tool Daemon Process Manager Daemon Tool Front-End 3.PM daemon creates tool daemon process (if necessary) TDP

10Tool Daemon Protocol TDP Job Startup Sequence Application Process Execution Host Local Host Process Manager Daemon Tool Front-End Tool Daemon PID, host/port pairs 4.PM daemon passes information to tool daemon (e.g., process pid, front-end host/port, standard I/O host/port)

11Tool Daemon Protocol TDP Job Startup Sequence Execution Host Local Host Process Manager Daemon Tool Front-End Tool Daemon Application Process 5.Tool daemon examines the application process (e.g., parses symbols, discovers static call graph)

12Tool Daemon Protocol TDP Job Startup Sequence Execution Host Local Host Process Manager Daemon Tool Front-End Tool Daemon Application Process 6.App process is allowed to run

13Tool Daemon Protocol TDP Pilot Implementation Goals To try out TDP ideas and see what makes sense in real environment To collect informed suggestions for a standard The software Two well-established packages at U. Wisconsin- Madison Paradyn performance tool Condor resource management system

14Tool Daemon Protocol 1.Process startup 2.Notification of exited processes 3.Inter-process communication Mechanism Identification of information to be transferred Asynchronous notifications 4.Private networks and firewalls Tool daemon communicating to front-end Application process sending standard I/O Challenges

15Tool Daemon Protocol Challenge: Process Startup Most functionality already in place, but not in the right place Need to refactor process startup logic between process manager daemon and tool daemon Control handoff (process manager daemon to tool daemon) difficult under some OSs E.g., Linux—two scheduling race conditions between application process and tool daemon

16Tool Daemon Protocol Challenge: Exit Process Notification Want the starter to be aware if the app or tool daemon process exits Process exit notification (e.g., SIGCHLD to the parent under UNIX/Linux) paradynd App SIGCHLD starter SIGCHLD Parent of

17Tool Daemon Protocol Challenge: Exit Process Notification paradynd App starter SIGCHLD Parent of Parental relationships may change when tool daemon attaches E.g., Linux—daemon process becomes app process’ parent  On app process’ termination, SIGCHLD sent to paradynd, NOT to the Condor starter Parent of

18Tool Daemon Protocol Challenge: Exit Process Notification paradynd App starter SIGCHLD SIGCHLD delivered to Condor starter only if paradynd calls wait()  Condor must trust monitoring daemon or poll the application process’ state

19Tool Daemon Protocol Challenge: Information Transfer “Attribute Space” {name, value} pairs shared between processes Mainly, intra-host sharing between process manager daemon and tool daemon Also tool front-end, daemon sharing E.g., application PIDs for front end Basic idea from MPICH Not a Linda tuple space Not a global shared environment space

20Tool Daemon Protocol Attribute Space (Execution Host) Process Manager Daemon Tool Daemon Application Process PID=2473 FE_host=cham.cs.wisc.edu FE_port=7331 tdp_put(“PID”, “2473”) tdp_put(“FE_host”, “cham.cs.wisc.edu”) tdp_put(“FE_port”, “7331”)

21Tool Daemon Protocol Attribute Space (Execution Host) Process Manager Daemon Tool Daemon Application Process PID=2473 FE_host=cham.cs.wisc.edu FE_port=7331 tdp_get(“PID”) tdp_get(“FE_host”) tdp_get(“FE_port”)

22Tool Daemon Protocol Challenge: Asynchronous Notification Uses attribute space In process interested in event notification, register action tdp_register_notify(handle, event, action) In event-generating process, deliver event to attribute space tdp_put(event,value) Value available in action function

23Tool Daemon Protocol Challenge: Firewalls and Private Nets Remote Host Local Host Process Manager Daemon Tool Front-End Tool Daemon Application Process Firewall X

24Tool Daemon Protocol Challenge: Firewalls and Private Nets Remote Host Local Host Process Manager Daemon Tool Front-End Tool Daemon Application Process Firewall Comm Proxy

25Tool Daemon Protocol Status Pilot implementation nearly complete Paradyn with jobs submitted to Condor Linux 2.4 “Create process” model Condor “vanilla” and “MPI” universes Remaining work: library packaging, documentation Periodic planning meetings Paradyn (Miller) Condor (Livny) U. Barcelona (Cortés, Senar) TUM (Wismüller) U. Vienna (Fahringer) U. Tennessee (Moore) MPICH (Butler, Gropp, Lusk) Etnus (Cownie, Delsignore) Globus (Kesselman) HP/Compaq (Balle) Pallas (Vampir group)

26Tool Daemon Protocol The Path Forward Identify necessary information exchange between principals Complete design, implement attribute space as standalone package Get other tool builders, process management system builders involved Integrate TDP ideas into their systems to see what works

27Tool Daemon Protocol Summary TDP standardizes the interface between process management systems and tools API for tools and management systems Support libraries Distributed attribute space Avoids the propagation of tool- and process manager-specific interfaces Pilot implementation nearly complete

28Tool Daemon Protocol TDP: The Tool Daemon Protocol It is the early stages of this important effort— we want your participation! Draft report in progress—available for review and comments soon Web: Barton Miller Philip Roth Brandon Schendel Victor Zandy Miron Livny Todd Tannenbaum Derek Wright Ana Cortés Miquel A. Senar Pilot Implementation Team