Download presentation
Presentation is loading. Please wait.
Published byChristina Joseph Modified over 9 years ago
1
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...1 Online Monitoring for the CDF Run II Experiment and the remote operation facilities T.Arisawa b), D.Fabiani a), D.Hirschbuehl c), K.Ikado b), T.Kubo d), Y.Kusakabe b), K.Maeshima e), J.Naganoma b), K. Nakamura d), C.Plager e), E.Schmidt f), H.Stadie c), R.Tsychiya b), G.Veramendi g), W.Wagner c), H.Wenzel c), M.Worcester f), K.Yorita h) a) I.N.F.N. Pisa, b) Waseda University, Tokyo, c) Universitaet Karlsruhe, d) K.E.K, Tsukuba e) Univ. of California, LA, f) Fermilab, g) L.B.L, Berkeley, h) Chicago University ACAT07, NIKHEF, Amsterdam, April 23-27, 2007 F. SCURI a) for the CDF online monitoring group:
2
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...2 CDF is one of the multipurpose experiments running at the Tevatron (Fermilab) Main injector and recycler Tevatron Booster CDF DØ p source Proton-antiproton collisions at 1.96 GeV center of mass energy Peak luminosity now above 2 x 10 32 cm 2 s -1
3
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...3 The CDF (Compact Detector at Fnal) detector Central calorimeters End-plug calorimeters Forward muon Central muon Silicon tracker Drift chamber Solenoid T.o.f. A typical architecture for high energy experiments: a tenth of sub-detectors including: - inner tracking systems and T.o.F. - central superconducting solenoid (0.5 T) - outer calorimeters and muon detectors - 3-level trigger system - multi-step data flow chain: L3 trigger (250 duals), Robotic tape storage Production farm (150 duals), distributed Central Analysis Facilities (dCAFs) (9) - a 20 PC cluster for the online systems: DAQ, detector calibration and operation monitor, event display, web servers Detector and data flow complexity similar to the LHC experiments: - 0.75 Million channels to be operated and monitored - 75 Hz L3 output at 20 MB/s (system limit) ~ 0.5 PB/year of data throughput
4
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...4 Monitor the DAQ/trigger/detector performance without interfering with the data taking. Monitored results must be interpreted by the shift crew fast and clear in order to maintain high quality data taking. Different consumer processes can run on different machines (expandability). Each consumer receives only the data it needs (choice of triggers) The monitoring and the display processes are separated. The number of displays is limited only by network traffic and bandwidth. Different consumers can be combined to one executable. The CDF SW environment is the interface common to all detector partitions easy maintainability. Grant save remote access to the monitoring access to experts and remote shifters DESIGN CHARACTERISTICS
5
Consumers/Monitors: 1) are AnalysisControl++ (C++) modules. - AC++ is the CDF offline analysis framework, handles dataflow between modules - AC++ serves input, output, event generation, detector simulation, event reconstr., database access, physics analyses; ROOT is used for event I/O and analysis tools - several updated version released in a year, the online modules must follow the SW evolution 2) typical monitored objects: detector occupancies, trigger rates, luminosity, vertex pos., physics objects, L3 reconstruction, calibration results….. 3) provide template programs with utility methods to access data and to produce histograms without interfering with the data acquisition; standard procedures are defined to compare actual and reference plots and to provide instructions to the operators. 4) each specific consumer/monitor is written according to the sub-detector specs, within the CDF SW framework 1) + 2) + 3) + 4) just one person in shift (Consumer Operator) can handle full detector and data flow monitoring; sub-detector experts are paged only in case of major troubles An example of a fast changing monitor: TrigMon (trigger monitor); trigger algorithms and tables change with the Tevatron increasing luminosity at fixed bandwidth. Display Server: is a program allowing a Display Browser to access as a client to the consumer output. Display Browser: is a program passing data to the user end node; the D.B. can run on the same machine as the D.S. or on different machine either via a socket connection or via WEB through a plug-in for the APACHE web-server (ROOT compatibility) Something more about the CDF online key words and key points….
6
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...6 L3 Trigger Push Protocol Consumer Server Consumer 1 Consumer 2Consumer 3 Consumer N Server display Config. Via “Talk to” Error Receiver Run Control Serious errors www State Monitor Online monitoring raw architecture
7
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...7 EVENTS CONSUMERS Consumer 1 Consumer 2 … SERVERSERVER TRIGGER TABLES TEMPLATES DATABASE (oracle management system) SOCKETSOCKET D IS P L A Y Fast socket connection FILES HTTP SERVERHTTP SERVER APACHEAPACHE D IS P L A Y WWWWWW Element communication layout: 2 ways to display the consumer results Input through web based oracle forms, oracle reports and automated cron jobs
8
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...8 Examples of Monitor Display Occupancy Monitor Event Display All monitor displays are updated each seconds Very useful to promptly detect new dead regions Histograms available on web after update requests NEW! Snapshot pictures (30 s update) available via WEB
9
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...9 3 people crew for the CDF data taking shifts SciCo (Scientific Coordinator) Accelerator Operators Detector on-call Experts CO (Consumer Operator) ACE (DAQ Run Control and DAQ monitor) The main job for Co’s is to online check the detector operation quality a) by looking at the: - Consumer Monitor outputs - Calibration results - Event Display b) by submitting the result of a detailed check-list
10
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...10 Motivations and project framework for remote monitoring shifts ( CDF CO (Consumer Operator) shifts ) Avoid oversea missions just for monitoring shifts (8 shift days + + RadWork training + travel time) save money to allow more people to be onsite for other activities on Physics, Detector upgrades, Operations duties. Reduce the number of people on shift during the night at Fnal Acquire know-how for future applications of remote shifts (LHC, ILC,…) Tsukuba and Pisa were chosen as pilot Institutions for the CDF Remote CO shift project Project developed in cooperation with CMS-Fnal, setting-up a multipurpose remote control room in the Wilson Hall at Fnal
11
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...11 Remote shift development project integration CDF { Tsukuba Fnal Pisa Fnal CMS CERN Fnal
12
The CO duties (full list and original access modes) DutyOperated viaRemote allowed Start/stop ConsumersCommands from and their Monitorsany online cluster node NO a) Check detector plots and fill a run check list everyWEB YES 2 hours Fill e-log with commentsWEB YES and post significant plots Execute the procedure toCommand sequence check several x-sectionsfrom any online node NO a) (XMon) Start the CalibrationCommands from any Monitor patch (DBANA)online cluster node NO a) Check calibrationsWEB YES Keep under eye theRunning on devoted NO a) monitor displayonline nodes a) Due to the DOE/Fnal policy, allowing only certified online group members to remotely connect to the online machines (not the case for regular CO shifter)
13
SW developments to allow full operational remote and local CO shifts in almost identical modes 1) Start/stop consumers Use a custom GUI (ConCon) tested and fully operating; and their Monitors (Consumer Control), built on ROOT switch from local to Switch between physics classes, with action buttons and remote operation mode and cosmic modes and process status windows (details handled in CDF control below). ConCon runs on a node of room by ACE or SciCo the online cluster; in case of remote shift, the GUI window is exported in the remote node 2) Execution of the special Moved on WEB from a local node of tested and fully operating procedure (XMon) to the online cluster; the command monitor relevant sequence and parameter passing is counting rates handled by an HTML form invoking a perl file. 3) Check of the detector Actual and reference plots accessedThe SW package to produce calibration output via webthe calibration output is handled by the ACE 4) Visual control of the real- Snapshots of the 12 monitor displaysTested and fully operating time updated mon. displays accessed via WEB (30 s update) Duty Project Present status
14
14 The new Consumer Controller GUI (ConCon) Consumer status action buttons (with pop-up menus) Running mode switch buttons Consumer monitors action buttons (with pop-up menus) Monitors and event display status action buttons (with pop-up menus) Access to the log files Based on ROOT classes Merges in a user friendly way all action alternatively performed with a lot of shell commands launching script files Allows remote operation when the window is exported from the local node where the GUI is running to the remote node (remote login on the online cluster are forbidden)
15
F.Scuri - I.N.F.N Pisa Control room pictures Section of the CDF control room at Fnal devoted to the Consumer Monitors CDF control room in Pisa (more monitors to be added)
16
16 Specs of the remote control room in Pisa: very easy and inexpensive ! Echo suppression Webcam (Polycom ViaVideo) Desktop running Polycom PVX under Win XP – IP point-to-point connection 4 LCD 17/19“ monitor display Screen 1: GUI windows to handle Consumers and job monitors Screen 2: Event Display and Calibration plots Screen 3: Active run and reference plots for the checklist Screen 4: E-logs, checklist, Instructions (WEB pages) Desktop running Mozilla, Root, Ghostview, ….. under SL 4 monitor drive card Number of screens will be increased for displaying monitor snapshots ….
17
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...17 Commissioning, pilot remote CO shifts and operation summary of the first regular remote CO shift period in Pisa - Definition of the start-up procedure for shifts in Pisa and for emergency handling, e.g. local (Fnal) shifter replacement in case of any failure at the start-up and network or H.W. failures during the shift - Test and validate new HW (Pisa and Fnal) for video connection and monitoring (Pisa) - Check new SW ( new GUI & new WEB based procedures) Pilot remote CO shifts (November 2006): - Two 8 day regular owl shifts, Pisa and Fnal in overlap Long term network connection reliability was the main issue (Short network breakdowns (~<15’) are not an issue) Regular remote CO shift operation summary (from December 2006) - 4 shift (one per month) until March 2007, 2 shifts scheduled in April 2007 - 6 x 8 days statistics very encouraging: no H.W., nor SW problems detected, no network connectivity breakdowns, just 15 min. interruption due to a scheduled maintenance…. Commissioning (September-November 2006):
18
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...18 Conclusions The CDF online system is reliably operating since several years; system architecture and SW evolution smoothly developed Real time monitoring of the detector status and of data quality is now fully operated also from remote both for expert feedbacks and for quality checking and monitor shifters (Consumer Operators) Local and remote CO shifts proceed now exactly in the same way Changes required for the remote shifts resulted in a global improvement towards a more user friendly environment First period experience of remote shifts from Pisa is very positive Based on the Pisa experience, CDF plans to operate other remote sites and to achieve assignment to remote of 100% owl (at Fnal) CO shifts. We encourage other worldwide experiment communities to follow this example.
19
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...19 BACK-UP
20
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...20 Framework Components I. The Consumers Analyze and monitor the event data Use CDF Run II offline framework (AC++) to look at data* Consists of different AC++ modules: APPConsumerInputModule ConsumerErrorModule (adds special destination to the ZOOM error logger to send errors to the ConsumerErrorReceiver) Module inheriting from ConsumerFrameworkModule consists of different monitors that are written by the experts. All these monitors inherit from BaseMonitor2 which starts server process at the beginning of a job. In order to look at CDF event data, a decision was made very early on to use Cdf offline framework for the consumers. However, the consumer framework is basically free from Cdf offline specifics and the entire package can be used easily in different settings. *
21
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...21 BaseMonitor2 TConsumerInfo Base class for all monitors Provides “framework” functions and functions to be overwritten by the monitor writers. is sent from the consumer to server is sent from the server to the display contains information about the consumer name run number number of events processed list of all ROOT objects that are available Framework Components (cont.)
22
F.Scuri - I.N.F.N PisaOnline Monitoring for the CDF...22 II. The Server III. The Display receives ROOT objects from the consumer via socket deals with requests from the displays reports the status of the consumer to the state manager ROOT-based GUI can connect to the server via socket or ROOT files, can browse Root objects from the consumers at first requests TConsumerInfo and creates a list tree only updates objects on canvas, does not redraw whole canvas useful features like; auto update, slide show, pop-up warning window, etc… Framework Components (cont.)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.