CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin Gaidioz, Anastasia Ivanchecnko, Gerhild Maier, Ricardo Rocha, Irina Sidirova IT-GS-MND
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overview Dashboard structure Dashboard in production –Job Monitoring –Grid reliability –Prodsys –Data Management –SAM –FTS monitoring –Site status board Future development Conclusions ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Dashboard Framework Web / HTTP Interface Data Access Layer (DAO) Agents Oracle DB DB reading and writing via DAO layer Connection pooling Easy to add interface for a different backend Collectors of information Common configuration and management Multiple clients: cli, web Multiple output formats: plain text, csv, xml, xhtml ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Transfer monitoring for ALICE Data management monitoring for ATLAS Production monitoring for ATLAS and CMS (prototypes) IO rate monitoring between WN and SE (prototype) Site availability based on the results of SAM tests Job Robot monitoring Accounting information from Apel and Gratia for ATLAS (prototype) Task monitoring for CMS analysis users (ATLAS on the way) Job monitoring Site reliability Experiment Dashboard COMMON applications ALICE, ATLAS, CMS, LHCb, Vlemed CMS Integration and commissioning Experiment specific applications Dashboard activities ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring Display all the jobs submitted by a VO o Follow the status of the jobs Collect information from different sources o RGMA, IC Real Time Monitor, BDII, MonALISA, … Very useful for VO managers, site admin, users Possibility to get the output in different formats Deployed for ALICE, ATLAS, CMS, LHCb and VleMed ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Site Reliability Efficiency of the different sites o Jobs and Job Attempts List of most common errors o And recipes to the solutions!! Generic application Automatic generation of monthly reports ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Site reliability ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Production System ATLAS Prodsys Identify failing tasks and jobs Evaluate the performance of the sites Daily/weekly/monthly statistics User guide ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Production System ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Production System ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Data Management Monitor of T0 and Production system Report of transfers to the different sites Integrated with the ATLAS management system Information of the clouds, sites, SE and datasets History of errors ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Data Management ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Data Management ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services FTS reliability Daily report on the success of transfers Drill down list of errors Integrated in the ALICE environment Extremely useful during the different ALICE challenges: PDC06, PDC07, CRC08 Working on making it generic ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services FTS reliability ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services SAM monitoring Service Availability Monitoring Clickable plots to drill down: Site availability Service availability Service tests Links to the SAM results At the moment, only for CMS ATLAS requested a similar interface Ongoing work to make it generic ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services SAM monitoring ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services SAM monitoring ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Site Status Board Table with status of the different sites for CMS Easy definition of new ‘metrics’ o The ‘metrics’ can come from different sources Links to more detailed information At the moment, deployed for CMS o It could be used by other VO Working on providing history o And aggregation… ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Site Status Board ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Site Status Board ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Experiment Dashboard plans Include more data sources: condor_g, L&B, Security: X509 authentication New application: Pilot jobs Input collections Improve existing applications Make the SAM interface generic More in depth failure analysis User requests and suggestions Integration with the GridMap technology ISGC
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Conclusions The Experiment Dashboard provides: Several monitor applications Integration of information from different sources Multiple output format: html, xml, csv, txt.. Generic appliations: Job Monitoring, Grid reliability Experiment specific DDM, ProdSys, Site Status Board, SAM, … Used in production by multiple VO User, installation and developer guides ISGC