Flexible Availability Computation Engine for WLCG Rajesh Kalmady, Phool Chand, Vaibhav Kumar, Digamber Sonvane, Pradyumna Joshi, Vibhuti Duggal, Kislay.

Slides:



Advertisements
Similar presentations
Wonderware Performance Software 3.5 “Real-time Visibility into Equipment Performance” Alex Chia Solution Sales Manager.
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Automated Grid Monitoring for LHCb Experiment through HammerCloud Bradley Dice Valentina Mancinelli.
1 1 Service Composition for LHC Computing Grid Monitoring Beob Kyun Kim e-Science Division, KISTI
Automatic Report Generation for WLCG/EGEE D. D. Sonvane (Gridview Team) B.A.R.C.
James Casey, CERN, IT-GT-TOM 1 st ROC LA Workshop, 6 th October 2010 Grid Infrastructure Monitoring.
Andrea Sciabà CERN CMS availability in December Critical services  CE, SRMv2 (since December) Critical tests  CE: job submission (run by CMS), CA certs.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
CERN IT Department CH-1211 Geneva 23 Switzerland t CCRC’08 Tools for measuring our progress CCRC’08 F2F 5 th February 2008 James Casey, IT-GS-MND.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
HLRmon accounting portal DGAS (Distributed Grid Accounting System) sensors collect accounting information at site level. Site data are sent to site or.
LCG Introduction John Gordon, STFC GDB June 8 th 2011.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey WLCG Monitoring – some worked examples.
Visualization Ideas for Management Dashboards
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
Julia Andreeva on behalf of the MND section MND review.
Validation of SAM3 monitoring data (availability & reliability of services) Ivan Dzhunov, Pablo Saiz (CERN), Elena Tikhonenko (JINR, Dubna) April 11, 2014.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
SAM Database and relation with GridView Piotr Nyczyk SAM Review CERN, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
HLRmon accounting portal The accounting layout A. Cristofori 1, E. Fattibene 1, L. Gaido 2, P. Veronesi 1 INFN-CNAF Bologna (Italy) 1, INFN-Torino Torino.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
Computation of Service Availability Metrics in Gridview Digamber Sonvane, Rajesh Kalmady, Phool Chand, Kislay Bhatt, Kumar Vaibhav Computer Division, BARC,
SUM like functionality with WLCG-MON Ivan Dzhunov.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
GridView - Presentation of Work done at CERN by D. D. Sonvane B.A.R.C.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Security Monitoring Daniel Kouřil EGI-TF 2011.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI 2 nd level support training Marian Babik, David Collados, Wojciech Lapka,
Using HLRmon for advanced visualization of resource usage Enrico Fattibene INFN - CNAF ISCG 2010 – Taipei March 11 th, 2010.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
CERN IT Department CH-1211 Geneva 23 Switzerland t LHCOPN Meeting Madrid, 11 th March 2008 James Casey WLCG Monitoring – An overview.
Daniele Bonacorsi Andrea Sciabà
Use of Nagios in Central European ROC
POW MND section.
Pedro Andrade ACE Status Update Pedro Andrade
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Proposal for obtaining installed capacity
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
March Availability Report for EGEE Sites based on Nagios
A Messaging Infrastructure for WLCG
Cristina del Cano Novales STFC - RAL
TS4.10 Comp Reports A new approach to Computing Availability/Reliability reports for EGI Progress Report C. Kanellopoulos GRNET 9/14/2018.
Monitoring of the infrastructure from the VO perspective
Pierre Girard ATLAS Visit
HLRmon accounting portal
Kashif Mohammad Deputy Technical Co-ordinator (South Grid) Oxford
Site availability Dec. 19 th 2006
Presentation transcript:

Flexible Availability Computation Engine for WLCG Rajesh Kalmady, Phool Chand, Vaibhav Kumar, Digamber Sonvane, Pradyumna Joshi, Vibhuti Duggal, Kislay Bhatt – Computer Division, BARC, India Wojciech Lapka, IT-GT-TOM, CERN CHEP 2010, Taipei

Introduction GridView Availability Engine ◦ Standard benchmark for site performance measurement Availability Computation Engine (ACE) ◦ LHC experiments require more flexibility Flexible Availability Computation Engine for WLCG

Availability and Reliability Flexible Availability Computation Engine for WLCG Availability Unknown intervals are ignored for availability calculations Reliability Reliability not affected by Scheduled Downtime Sample Reliability Graph Sample Availability Graph

ACE – added values Flexible availability algorithms ◦Several algorithms per VO ◦Support for sites as viewed by the VOs Improvement of availability recomputations Usage of single authoritive topology provider Flexible Availability Computation Engine for WLCG

Topology – terminology Flexible Availability Computation Engine for WLCG SITE CESE SRMv1 CREAM CE SRMv2glite-CE s1s1 snsn s1s1 snsn s1s1 snsn s1s1 snsn Service Flavour Service Service Type

Metrics and Profiles Flexible Availability Computation Engine for WLCG t1t1 t1t1 tntn tntn CREAM CE t 1’ t n’ SRMv2 t 1’’ t n’’ sBDII Metric s1s1 snsn s1s1 snsn s1s1 snsn Profile 1 Profile 2 Profile: combination of metrics and services and the algorithm for availability computation Service Flavour Service

Availability Algorithm AND / OR / % /... Aggregation for a Service Type AND / OR / % /... Aggregation for a Site Service status Site status Service Type status Flexible Availability Computation Engine for WLCG

Computations in ACE (1/3) Service Status Computation Metrics Metric Results Services Aggregation Algorithm Profile Flexible Availability Computation Engine for WLCG

Computations in ACE (2/3) Service Status Computation Metrics Metric Results Services Service Status Service Type Availabiliy and Reliability Site Aggregation Algorithm Profile Flexible Availability Computation Engine for WLCG

Computations in ACE (3/3) Service Status Computation Metrics Metric Results Services Service Status Service Type Availabiliy and Reliability Site Aggregation Algorithm Site Availability and Reliability Computation Profile Flexible Availability Computation Engine for WLCG Service Type Status

Standard GridView algorithm Flexible Availability Computation Engine for WLCG per (metric, s, vo) Metric Results Service Status Service Type Status Site Status per (s, vo) Service Type (e.g. CE, SE, BDII,...) ANDing All metrics in OK state  up per (site, service Type, vo) ORing At least one service up  up per (site, vo) ANDing All Service Type statuses up  up Service (s) = (service flavour, node)

ACE – Dimensions Availability and reliability numbers are computed for a Profile, VO and for: ◦ Service Flavour (e.g. CREAMCE, OSGCE,...) ◦ Service Type (e.g. CE, SE) ◦ Site Time dimensions: ◦ Hour ◦ Day ◦ Week ◦ Month Flexible Availability Computation Engine for WLCG

Flexible availability algorithms Use case: ◦ LHC experiments need flexible algorithms Examples ◦ Site is in OK state if either CE(s) or ArcCE(s) or OSGCE(s) are in OK state ◦ Site is in OK state if at least 80% of FTS-es for my VO are available ◦... Flexible Availability Computation Engine for WLCG

Several algorithms per VO Each VO can define several algorithms on any set of WLCG services Use cases: ◦ LHC experiments want to test Tier-1 and Tier-2 sites differently ◦ Experiments want to measure analyzes capability and production capability at the sites ◦ Easier validation of new availability algorithms ◦... Flexible Availability Computation Engine for WLCG

Support for distributed sites Flexible Availability Computation Engine for WLCG s1s1 s4s4 s5s5 s2s2 s3s3 s6s6 s7s7 s8s8 snsn s1s1 s4s4 s5s5 s2s2 s3s3 s6s6 s7s7 s8s8 snsn PHYSICAL SITE Resource Grouping E.g.: WLCG Federations, Tier-1 sites,... PHYSICAL SITE

Improved availability recomputations Use cases: ◦ More accurate recomputation of availabilities and reliabilities ◦ Automatic recovery from late measurements Achieved by: ◦ Historical view of the WLCG topology ◦ Automatic recomputation of availabilities triggered by delayed arrival of metric results Flexible Availability Computation Engine for WLCG

Improved availability recomputations Flexible Availability Computation Engine for WLCG Example – site availability/reliability: ◦ Site X contains 2 CE: s 1, s 2 ◦ 01-Oct: Status of services s 1 : OK, s 2 : DOWN ◦ 01-Oct: Availability: 100%, Reliability: 100% ◦ 02-Oct: s 1 decomissioned ◦ 05-Oct: Availabilities recomputation for 01-Oct  Old GridView engine: Availability 0%, Reliability: 0%  ACE: Availability: 100%, Reliability: 100%

Usage of topology provider Old GridView engine: ◦ Topology taken from several sources ACE: ◦ Topology taken from single Aggregated Topology Provider (ATP) Flexible Availability Computation Engine for WLCG

Visualization Present gridview interface adapted to display ACE metrics Work in progress to display the ACE Status and availability metrics in the new visualization portal Flexible Availability Computation Engine for WLCG

Future work Integration with the new visualization portal Dynamic availability and reliability recomputations Graphical interface for defining algorithms Flexible Availability Computation Engine for WLCG

Summary ACE satisifies the requirements of the LHC experiments Flexible Availability Computation Engine for WLCG

Links WEB/Home WEB/Home Contact us: Flexible Availability Computation Engine for WLCG

Acknowledgments Thank you to the GridView Team for their excellent work on the project: ◦ Rajesh Kalmady ◦ Phool Chand ◦ Vaibhav Kumar ◦ Digamber Sonvane ◦ Pradyumna Joshi ◦ Vibhuti Duggal ◦ Kislay Bhatt Flexible Availability Computation Engine for WLCG