The ALICE data quality monitoring Barthélémy von Haller CERN PH/AID For the ALICE Collaboration
The ALICE experiment LHC : Large Hadron Collider ALICE : A Large Ion Collider Experiment – 18 detectors – Bandwidth to mass storage : 1.25 GB/s – Event size : 86.5 MB – Trigger rate : 10 KHz 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID1/15
Data Quality Monitoring 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID2/24 Online feedback on the quality of data Avoid taking and recording low-quality data Identify and solve problem(s) early Data Quality Monitoring (DQM) involves -Online gathering of data -Analysis by user-defined algorithm -Storage of monitoring data -Visualization
Data-Acquisition architecture 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID3/24 DA DQM Sub-event
The AMORE framework AMORE : Automatic MOnitoring Environment A DQM framework for the ALICE experiment 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID4/24
Publisher – Subscriber paradigm Database used for the data pool Notification with DIM (Distributed Information Management System) 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID5/24 Design & Architecture
Published objects are encapsulated into « MonitorObject » structure Plugin architecture using ROOT reflection – Modules are dynamic libraries loaded at runtime 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID6/24 Design & Architecture
14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID7/24
The Pool 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID8/24 Current implementation based on a database MySQL : reliable, performant, open-source
Archiving Short-term history : First-In First-Out (FIFO) Long-term archives : At end of run, regular intervals, and users’ request 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID9/24 Agent Latest value GUI Publish Access X recent values FIFO Temporary and permanent archive Archive triggers : Start and end of run, regular time interval, at shifter’s request
Subscriber & User Interface Generic GUI – Display any object of any running agent – Possibility of handling automatically the layout – Layout can be pretty complex and saved for future reuse – Fit the basic needs of the users to check what is published by the agents For more complex needs, users can develop their own GUI 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID10/24
The generic GUI 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID11/24 Agent Agents Monitor ObjectsSub-directories
The generic GUI 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID12/24 Save Load
Custom gui 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID13
Packaging & validation Subversion repositories GNU Autotools Distributed as RPM (1+12 packages) Strict release procedure – Build and validate the module on a test machine in a clean and controlled environment Nightly build – Identify broken code (wrong results, unable to compile) 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID14/24
Performance & benchmark 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID15/24 Online environment and heavy calculation ensure performance and scalability To identify and handle performance issues we need : – Metrics – Statistics – Reproducible tests
Performance & benchmark Same procedure and environment as for the validation of modules – Estimation of needs for each detector – Identification of variations over time – Comparisons of machines, compilers and architectures 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID16/24
Performances & benchmark 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID17/24 Current DQM nodes : Intel(R) Xeon(R) CPU Latest generation of intel processor : Intel(R) Core(TM) i7 CPU
Database benchmark All data transit through the pool critical part of the system Test of extreme and standard use cases Several improvements made : – Concatenate queries and insertions – MySQL engine : MyISAM vs InnoDB 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID18/24
InnoDB vs MyISAM 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID19/24
Status In production since last summer, used during commissioning and first beam 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID20/24
14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID21/24
Status In production since last summer, used during commissioning and first beam New features are regularly added, usually at users’ request 18 modules under development 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID22/24
Plans Access to monitor objects through the web via the ALICE electronic LogBook Fully automatize the process : comparisons to reference data, identification of problems, notification, actions taken Add features to take full advantage of multi-cores architecture 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID23/24
Conclusion AMORE has been in production for almost a year Increasing number of detector agents Proved to be very useful during commissioning and first beam period Capable of handling large number of agents, clients and objects Ready for the LHC restart ! 14/05/2009 – IEEE RT2009Barthélémy von Haller - CERN PH/AID24/24