Presentation is loading. Please wait.

Presentation is loading. Please wait.

02/02/20001/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Sending Commands and Managing Processes across the BaBar OPR Unix Farm.

Similar presentations


Presentation on theme: "02/02/20001/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Sending Commands and Managing Processes across the BaBar OPR Unix Farm."— Presentation transcript:

1 02/02/20001/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Sending Commands and Managing Processes across the BaBar OPR Unix Farm through C++ and CORBA Tom Glanzman (SLAC) on behalf of Gilbert Grosdidier (LAL-Orsay) (for the BaBar Prompt Reconstruction and Computing Groups) Paper #161 - CHEP 2000 - Padova

2 02/02/20002/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Context of the project Online Prompt Reconstruction (OPR) Online Prompt Reconstruction (OPR)  Unix distributed farm typically 100-200 nodestypically 100-200 nodes  Processing the BaBar raw data events within a projected latency of about 8 hourswithin a projected latency of about 8 hours The purpose of this project was to build a tool able to launch, monitor and control remote processes inside of this OPR farm The purpose of this project was to build a tool able to launch, monitor and control remote processes inside of this OPR farm  It was actually started beginning of October 1999  It is called GFD (Global Farm Daemon)

3 02/02/20003/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Functional Design GFD Process GFD-Client Process Compute Node GFD Process GFD Process 200 compute nodes 2-way CORBA Call Command Node OPR Farm Detached Process

4 02/02/20004/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Internal Layout C++ Unix OS CORBA (TAO) Sockets ACE Gfd-Client Perl wrapper Driver C++ Unix OS CORBA (TAO) Sockets ACE Server CORBA GFD -update cmd list -shutdown -launch procs -run Unix cmds -kill procs (fork, exec)

5 02/02/20005/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Design Requirements This command layer was required to be: This command layer was required to be:  Fast : broadcasting to the whole farm in a few seconds  Lightweight: the whole system must remain very simple  Flexible: one can build sophisticated macros  Robust: unreliable nodes do not interfere with others  Improving process control:  all processes on a compute node belong to one userid  Reasonably secure:  limited command library with aliases  ACLs specific to each command  Reliable: started and monitored by a cron job  Scalable: together with the number of compute nodes to reach

6 02/02/20006/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Design Components Overall BaBar context was: Overall BaBar context was:  OO/C++, Distributed Computing, Unix Given the OPR context, we chose: Given the OPR context, we chose:  CORBA for the message layer  ACE/TAO as a C++ CORBA API Current versions built & running through: Current versions built & running through:  Solaris 2.6  using native C++ (4.2) compiler  ACE 5.04 & TAO 1.04  using native Solaris threads

7 02/02/20007/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua The Server components A set of high level TAO wrappers A set of high level TAO wrappers  provided by BaBar (see other talk) The Command Library The Command Library  It is a human readable file, and contains:  alias name, together with complete command definition  options and parameters, in case of a macro  diverted log directory name (to store the results)  list of users allowed to access the command (ACL)  A special command reloads it after an update

8 02/02/20008/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua The Server components (2) Returning information to the Client Returning information to the Client  Commands are mainly run asynchronously  as a background process, not waiting for the output  log its output onto a file, whose name is returned to the client  Other modes are available, for special use only The Command Processor The Command Processor  Authentication layer  A few commands are caught and processed directly  Option parsing, allowing utility switches  Execution layer:  the command string is built and wrapped into a " system() "-like call

9 02/02/20009/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua The Client It is used for many very different purposes: It is used for many very different purposes:  check the status of the servers  launch and monitor specific tasks on all farm nodes  stop or kill some remote processes Two versions of the client coexist Two versions of the client coexist  both accessible through the same Perl wrapper  a single-threaded version targets a single node  a multi-threaded one tackles a string of nodes in one shot  the client uses a non-blocking loop to contact the GFDs  no delays if a misbehaving GFD is seen

10 02/02/200010/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua The Client components The functions achieved by the client program: The functions achieved by the client program:  select the GFDs and collect their TAO IORs  send the command alias and options through a CORBA call to every GFD  receive the returned data from the same CORBA call and process it, if anyand process it, if any The multi-threaded version: The multi-threaded version:  saves resources: memory, CPU time, name server calls  but it requires subtle and thoughtful coding. Some traps:  TAO initialisation at run-time to be MT safe  Avoid use of special CORBA types outside message handling  Check every utility or tool for MT-safety, or move it out of the thread

11 02/02/200011/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Command execution mode DriverCommandReturnedCommand executionoutputto the clientcompletion modeacknowledge Achievedon the through through GFD server CORBA CORBA SynchronouslyLoggedLog filenameYes, by GFD(optional:in case of full command failure output) Asynchronously LoggedNothing Yes, by GFD(optional: in case of by GFD(optional: in case of Log filename) failure

12 02/02/200012/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Performance Issuing "uname -a" to 100 nodes synchronously requires, in elapsed time: Issuing "uname -a" to 100 nodes synchronously requires, in elapsed time:  230 sec. when using "ssh -x"  25 sec. when using the single threaded client –the Perl wrapper achieving the loop over the nodes  7.6 sec. when running a multi-node client –the client achieved the loop sequentially over the nodes  4.7 sec. when running the multi-threaded client This MT client is scalable, and was extensively tested with GFDs running over 250 nodes This MT client is scalable, and was extensively tested with GFDs running over 250 nodes

13 02/02/200013/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Conclusions GFDs in production for 3 months now: robust & reliable GFDs in production for 3 months now: robust & reliable Most of the required functionalities implemented and running Most of the required functionalities implemented and running We have demonstrated successful use of CORBA in farm management We have demonstrated successful use of CORBA in farm management However, carefully consider the use of CORBA (ACE+TAO) for a large project However, carefully consider the use of CORBA (ACE+TAO) for a large project  Significant learning curve of weeks, not days  Documentation is weak, and not always reliable (but improving)  Fast response from ACE/TAO support team The current project was rather limited and simple, and constituted an ideal case study for the setup of these tools The current project was rather limited and simple, and constituted an ideal case study for the setup of these tools

14 02/02/200014/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Future & Evolution Extend the use of GFDs to all OPR servers, at least for monitoring Extend the use of GFDs to all OPR servers, at least for monitoring  not only on compute nodes Embed this system in a "Global Farm Manager", to drive/coordinate the entire farm Embed this system in a "Global Farm Manager", to drive/coordinate the entire farm  not only managing processes Propose to use GFDs also in the DAQ system Propose to use GFDs also in the DAQ system  not only Reconstruction farm We are still struggling to sort out a few (?) rough edges. We are still struggling to sort out a few (?) rough edges.


Download ppt "02/02/20001/14 Managing Commands & Processes through CORBA CHEP 2000 PaduaPadua Sending Commands and Managing Processes across the BaBar OPR Unix Farm."

Similar presentations


Ads by Google