Download presentation
Presentation is loading. Please wait.
Published byLee Gibson Modified over 9 years ago
1
First year experience with the ATLAS online monitoring framework Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration CHEP 2009, March 23 rd -27 th Prague
2
2 Outline ATLAS trigger and data acquisition system in a glance Online monitoring framework Readiness of the online monitoring for runs with cosmic rays and first LHC beam Conclusions ATLAS trigger and data acquisition system in a glance Online monitoring framework Readiness of the online monitoring for runs with cosmic rays and first LHC beam Conclusions
3
3 High Level Triggers Software based ATLAS Trigger/DAQ Data Storage Event Filter ~200 Hz Event Builder (EB) LVL2 Trigger ~3 kHz Read Out Systems (ROSs) Pixel TileCalLArMDT SCTTRT Calorimeter Inner Detector Muon Spectrometer RPC TGC CSC LVL1 Trigger <100 kHz Interaction rate ~1 GHz Bunch crossing rate 40 MHz Coarse granularity data Calorimeter and Muon based Identifies Regions of Interest Partial event reconstruction in Regions of Interest Full granularity data Trigger algorithms optimized for fast rejection Full event reconstruction seeded by LVL2 Trigger algorithms similar to offline 900 farm nodes 1/3 of final system 150 PC ~100 PC Hardware based
4
4 Online monitoring framework Analyze events content and produce histograms Analyze operational conditions of hardware and software detector elements, trigger and data acquisition systems. Automatic checks of histogram and operational data Visualize and save results Produce visual alerts Set of tools to visualize information aimed for the shifters Automatic archiving of histograms Monitoring data available remotely Event Analysis Frameworks Event Analysis Frameworks Data Quality Analysis Framework Data Quality Analysis Framework Data Monitoring Archiving Tools Data Monitoring Archiving Tools Visualization Tools Visualization Tools Web Service Web Service Event samples Operational data Data Flow: ROD/LVL1/HLT About 35 dedicated machines Information Service Complexity and diversity in terms of monitoring needs of the sub-systems
5
5 Data Quality Monitoring Framework Configuration DB Conditions DB Data Quality monitoring display Event Analysis Frameworks Event Analysis Frameworks Data Flow: ROD/LVL1/HLT Input Interface Output Interface Configuration Interface Information Service Histograms Configuration DQResults Distributed framework that provides the mechanism to execute automatic checks on histograms and to produce results according to a particular user configuration. Input and Output classes can be provided as plug-ins. Custom plug-ins are supported About 40 predefined algorithms exists (Histogram empty, Mean values, Fits, Reference comparison, etc) Custom algorithms are allowed Writes DQ Results automatically to Conditions Database. Distributed framework that provides the mechanism to execute automatic checks on histograms and to produce results according to a particular user configuration. Input and Output classes can be provided as plug-ins. Custom plug-ins are supported About 40 predefined algorithms exists (Histogram empty, Mean values, Fits, Reference comparison, etc) Custom algorithms are allowed Writes DQ Results automatically to Conditions Database. Event samplers Histograms
6
6 DQM Display Summary panel shows overall color-coded DQ status produced by DQMF per sub-system Run Control conditions Log information Details panel offers access to the detailed monitored information Checked histograms and their references Configuration information (algorithms, thresholds, etc.) History tab displays time evolution of DQResults. Details panel offers access to the detailed monitored information Checked histograms and their references Configuration information (algorithms, thresholds, etc.) History tab displays time evolution of DQResults. About 17 thousands histograms checked Shifter attention focused on bad histograms
7
7 DQM Display - layouts DQM Display allows for a graphical representation of the sub-systems and their components using detector-like pictorial views Bad histograms spotted even faster Expert tool DQM Configurator for editing configuration, aimed at layouts and shapes. from a existing configuration one can attach layouts and shapes these layouts are created and displayed online the same way they will show in the DQM Display experts can tune layout/shape parameters until they look as required DQM Display allows for a graphical representation of the sub-systems and their components using detector-like pictorial views Bad histograms spotted even faster Expert tool DQM Configurator for editing configuration, aimed at layouts and shapes. from a existing configuration one can attach layouts and shapes these layouts are created and displayed online the same way they will show in the DQM Display experts can tune layout/shape parameters until they look as required
8
8 Online Histogram Presenter Supports hierarchy of tabs which contain predefined set of histograms Reference histograms can be displayed as well Sub-systems normally have several tabs with most important histograms which have to be watched out Supports hierarchy of tabs which contain predefined set of histograms Reference histograms can be displayed as well Sub-systems normally have several tabs with most important histograms which have to be watched out Main shifter tool for checking histograms manually
9
9 Trigger Presenter Presents trigger specific information in a user friendly way: Trigger rates Trigger Chains information HLT farms status Reflect status of HLT sub-farms using DQMF color codes. Implemented as an OHP plug-in
10
10 Histogram Archiving Almost 100 thousands histograms are currently saved at the end of a run (~200 MB per run) Reads histograms from IS with respect to the given configuration and saves them to Root files Registers Root files to Collection and Cache service Accumulates files into large archives and send them to CDR Archiving is done asynchronously with respect to the Run states/transitions Histograms archived can be browsed as well by a dedicated tool Monitoring Data Archiving Collection and Cache CDR Archive Browser HistogramsROOT files ZIP Information Service
11
11 Operational Monitoring Display Each process in the system publishes its status and running statistics into Information Service => O(1)M objects Reads IS information with respect to user configuration and display it as time series graphs, bar charts. Analyses distributions against thresholds Groups and highlights the information for the shifter Is being mostly used for the HLT farms status: CPU, memory, events distribution
12
12 Event Displays Atlantis: Java based 2D event display VP1 : 3D Event display running in offline reconstruction framework Both Atlantis and VP1 have been used during Commissioning runs and LHC start-up Atlantis: Java based 2D event display VP1 : 3D Event display running in offline reconstruction framework Both Atlantis and VP1 have been used during Commissioning runs and LHC start-up Both Atlantis and VP1 can be used in remote monitoring mode - capable of browsing recent events via an http server.
13
13 Remote access to the monitoring information Public - monitoring via Web Interface: Information is updated periodically No access restrictions Expert and Shifter - monitoring via the mirror partition: Quasi real time information access Restricted access
14
14 Web Monitoring Interface Generic framework which is running at P1 and is publishing information periodically to the Web The information which is published is provided by plug-ins: currently two Generic framework which is running at P1 and is publishing information periodically to the Web The information which is published is provided by plug-ins: currently two Run Status shows status and basic parameters for all active partitions at P1. Data Quality shows the same information as the DQM Display (histograms, results, references, etc.) with few min. update interval. Run Status shows status and basic parameters for all active partitions at P1. Data Quality shows the same information as the DQM Display (histograms, results, references, etc.) with few min. update interval. Monitoring at Point 1 (ATCN) Data Flow: LVL1/HLT Remote Monitoring (CERN GPN) Event Analysis Frameworks Event Analysis Frameworks Data Quality Analysis Framework Data Quality Analysis Framework Data Monitoring Archiving Tools Data Monitoring Archiving Tools Visualization Tools Visualization Tools Web Service Web Service Web Browser Web Browser Information Service
15
15 Remote Monitoring via mirror partition Remote users are able to open remote session on one of the dedicated machines located at CERN GPN: Environment looks exactly like at P1 All monitoring tool displays are available and work exactly as at P1 The production system setup supports up to 24 concurrent remote users Remote users are able to open remote session on one of the dedicated machines located at CERN GPN: Environment looks exactly like at P1 All monitoring tool displays are available and work exactly as at P1 The production system setup supports up to 24 concurrent remote users Monitoring at Point 1 (ATCN) Data Flow: LVL1/HLT Remote Monitoring (CERN GPN) Event Analysis Frameworks Event Analysis Frameworks Data Quality Analysis Framework Data Quality Analysis Framework Data Monitoring Archiving Tools Data Monitoring Archiving Tools Visualization Tools Visualization Tools Web Service Web Service Web Browser Web Browser Visualization Tools Visualization Tools Information Service Information Service Mirror Almost all information from Information Service is replicated to the mirroring partition The information is available in the mirror partition with the O(1) ms delay Almost all information from Information Service is replicated to the mirroring partition The information is available in the mirror partition with the O(1) ms delay
16
16 Performance achieved Online Monitoring Infrastructure is in place and is functioning reliably: More than 150 event monitoring tasks are started per run Handles more than 4 millions histogram updates per minute Almost 100 thousands histograms are saved at the end of a run (~200 MB) Data Quality status are calculated online (about 10 thousands histograms checked/min.) and stored in database. Several Atlantis event displays are always running in the ATLAS Control Room and Satellite Control Rooms showing events for several data streams Monitoring data is replicated in real-time to the mirror partition running outside P1 (with few msec delay) Remote monitoring pilot system deployed successfully Online Monitoring Infrastructure is in place and is functioning reliably: More than 150 event monitoring tasks are started per run Handles more than 4 millions histogram updates per minute Almost 100 thousands histograms are saved at the end of a run (~200 MB) Data Quality status are calculated online (about 10 thousands histograms checked/min.) and stored in database. Several Atlantis event displays are always running in the ATLAS Control Room and Satellite Control Rooms showing events for several data streams Monitoring data is replicated in real-time to the mirror partition running outside P1 (with few msec delay) Remote monitoring pilot system deployed successfully
17
17 Conclusions The tests performed on the system indicate that the online monitoring framework architecture meets ATLAS requirements. The monitoring tools have been successfully used during data taking in detector commissioning runs and during LHC start- up. Further details on DQM Display, Online Histogram Presenter and Gatherer on dedicated posters. The tests performed on the system indicate that the online monitoring framework architecture meets ATLAS requirements. The monitoring tools have been successfully used during data taking in detector commissioning runs and during LHC start- up. Further details on DQM Display, Online Histogram Presenter and Gatherer on dedicated posters.
18
… …
19
19 Users have to provide: Framework components GNAM P Data Flow: ROD/LVL1/HLT EMON Event Monitoring MonaIsa P OMD (Operational Monitoring Display) C Event Filter PT (Processing Task) JO IS (Information Service) Gatherer OHP (Online Histogram Presenter) C DQMF (Data Quality Mon Framework) C MDA (Mon Data Archiving) C OH (Online Histogranning) C P JO Configuration files Plugins (C++ code) Job Option files TriP (Trigger Presenter) Event Display (ATLANTIS, VP1) WMI (Web Monitoring Interface) P C
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.