Presentation is loading. Please wait.

Presentation is loading. Please wait.

Control and monitoring of trigger algorithms using Gaucho

Similar presentations


Presentation on theme: "Control and monitoring of trigger algorithms using Gaucho"— Presentation transcript:

1 Control and monitoring of trigger algorithms using Gaucho
Eric van Herwijnen Wednesday 12th October 2005

2 Contents The Problem Gaucho architecture Implementation Experience
Conclusions Wed 12th October 2005 Control and monitoring of trigger algorithms using Gaucho

3 The problem Control and monitor trigger (Gaudi) processes on event filter farm Send monitoring data (counters, rates, histograms, status, error messages) to ECS Configure jobs on the fly Combine information from individual CPUs Wed 12th October 2005 Control and monitoring of trigger algorithms using Gaucho

4 running configure ready start
GAUCHO architecture (1 process) running configure ready start PVSS runs DIM clients to send comands and get data from Gaudi jobs PVSS runs DIM server to send accumulated data to ROOT PVSS project with FSM Gaudi Jobs are device units FSM command “start” sends DIM command “start” to Gaudi Job FSM command (“configure”) starts execution of job Gaudi Job starts event loop and sends state “running” to PVSS Gaudi Job sends state “ready” to PVSS Gaudi Job creates a DIM server Counters and histograms Sent to PVSS

5 Implementation C++ Gaudi MonitorSvc allows same online/offline code
PVSS Panel structure: Per job (counters, configuration, dynamic subscription to histograms on the transient store) Per node (two jobs, counters and histograms summed/averaged) Per subfarm (n nodes) ~30 datapoints/job, ~10 dpe’s each ~100 Dim services/job (some internal) Dim services setup in a PVSSCTRL PVSS library to manipulate histograms (executed when panels are open) Packaged as LHCb JCOP Framework compatible tool Root viewer for 2D histograms and further analysis Wed 12th October 2005 Control and monitoring of trigger algorithms using Gaucho

6 Experience First experience during RTTC bad: too much CPU usage on PVSS machine Scripts rewritten, latest tests with 20 jobs on 10 lxplus nodes better Tests with dummy Gaudi job Idle configuration for 1 node (2 jobs) 80 Mb, 4% CPU (excluding PVSS itself) Wed 12th October 2005 Control and monitoring of trigger algorithms using Gaucho

7

8

9

10

11

12

13 Experience 1 node, 2 jobs: 71 % CPU usage on PVSS machine
38% PVSSCTRL, 20% PVSSEvent, 10% PVSSData, rest other PVSS processes Stopping jobs takes 2 secs, all processes reduce CPU consumption as expected Now try 20 jobs over 10 lxplus nodes Idle configuration: 225 Mb, 8% CPU Wed 12th October 2005 Control and monitoring of trigger algorithms using Gaucho

14

15 Experience with 20 jobs 2205 dim services
CPU usage 100% on PVSS machine Viewing counters (10 secs) and histograms (20 secs) OK Proportion between PVSSCTRL, PVSSEvent, PVSSDim the same Stopping jobs takes about 2 minutes CPU usage correctly drops Some unexplained crashes of PVSSDIM, memory usage after stopping stays high Wed 12th October 2005 Control and monitoring of trigger algorithms using Gaucho

16 Conclusions Performance is now reasonable
Next step: integration of Gaucho into run control system of LHCb event filter farm (November) Wed 12th October 2005 Control and monitoring of trigger algorithms using Gaucho


Download ppt "Control and monitoring of trigger algorithms using Gaucho"

Similar presentations


Ads by Google