Flexible Availability Computation Engine for WLCG Rajesh Kalmady, Phool Chand, Vaibhav Kumar, Digamber Sonvane, Pradyumna Joshi, Vibhuti Duggal, Kislay Bhatt – Computer Division, BARC, India Wojciech Lapka, IT-GT-TOM, CERN CHEP 2010, Taipei
Introduction GridView Availability Engine ◦ Standard benchmark for site performance measurement Availability Computation Engine (ACE) ◦ LHC experiments require more flexibility Flexible Availability Computation Engine for WLCG
Availability and Reliability Flexible Availability Computation Engine for WLCG Availability Unknown intervals are ignored for availability calculations Reliability Reliability not affected by Scheduled Downtime Sample Reliability Graph Sample Availability Graph
ACE – added values Flexible availability algorithms ◦Several algorithms per VO ◦Support for sites as viewed by the VOs Improvement of availability recomputations Usage of single authoritive topology provider Flexible Availability Computation Engine for WLCG
Topology – terminology Flexible Availability Computation Engine for WLCG SITE CESE SRMv1 CREAM CE SRMv2glite-CE s1s1 snsn s1s1 snsn s1s1 snsn s1s1 snsn Service Flavour Service Service Type
Metrics and Profiles Flexible Availability Computation Engine for WLCG t1t1 t1t1 tntn tntn CREAM CE t 1’ t n’ SRMv2 t 1’’ t n’’ sBDII Metric s1s1 snsn s1s1 snsn s1s1 snsn Profile 1 Profile 2 Profile: combination of metrics and services and the algorithm for availability computation Service Flavour Service
Availability Algorithm AND / OR / % /... Aggregation for a Service Type AND / OR / % /... Aggregation for a Site Service status Site status Service Type status Flexible Availability Computation Engine for WLCG
Computations in ACE (1/3) Service Status Computation Metrics Metric Results Services Aggregation Algorithm Profile Flexible Availability Computation Engine for WLCG
Computations in ACE (2/3) Service Status Computation Metrics Metric Results Services Service Status Service Type Availabiliy and Reliability Site Aggregation Algorithm Profile Flexible Availability Computation Engine for WLCG
Computations in ACE (3/3) Service Status Computation Metrics Metric Results Services Service Status Service Type Availabiliy and Reliability Site Aggregation Algorithm Site Availability and Reliability Computation Profile Flexible Availability Computation Engine for WLCG Service Type Status
Standard GridView algorithm Flexible Availability Computation Engine for WLCG per (metric, s, vo) Metric Results Service Status Service Type Status Site Status per (s, vo) Service Type (e.g. CE, SE, BDII,...) ANDing All metrics in OK state up per (site, service Type, vo) ORing At least one service up up per (site, vo) ANDing All Service Type statuses up up Service (s) = (service flavour, node)
ACE – Dimensions Availability and reliability numbers are computed for a Profile, VO and for: ◦ Service Flavour (e.g. CREAMCE, OSGCE,...) ◦ Service Type (e.g. CE, SE) ◦ Site Time dimensions: ◦ Hour ◦ Day ◦ Week ◦ Month Flexible Availability Computation Engine for WLCG
Flexible availability algorithms Use case: ◦ LHC experiments need flexible algorithms Examples ◦ Site is in OK state if either CE(s) or ArcCE(s) or OSGCE(s) are in OK state ◦ Site is in OK state if at least 80% of FTS-es for my VO are available ◦... Flexible Availability Computation Engine for WLCG
Several algorithms per VO Each VO can define several algorithms on any set of WLCG services Use cases: ◦ LHC experiments want to test Tier-1 and Tier-2 sites differently ◦ Experiments want to measure analyzes capability and production capability at the sites ◦ Easier validation of new availability algorithms ◦... Flexible Availability Computation Engine for WLCG
Support for distributed sites Flexible Availability Computation Engine for WLCG s1s1 s4s4 s5s5 s2s2 s3s3 s6s6 s7s7 s8s8 snsn s1s1 s4s4 s5s5 s2s2 s3s3 s6s6 s7s7 s8s8 snsn PHYSICAL SITE Resource Grouping E.g.: WLCG Federations, Tier-1 sites,... PHYSICAL SITE
Improved availability recomputations Use cases: ◦ More accurate recomputation of availabilities and reliabilities ◦ Automatic recovery from late measurements Achieved by: ◦ Historical view of the WLCG topology ◦ Automatic recomputation of availabilities triggered by delayed arrival of metric results Flexible Availability Computation Engine for WLCG
Improved availability recomputations Flexible Availability Computation Engine for WLCG Example – site availability/reliability: ◦ Site X contains 2 CE: s 1, s 2 ◦ 01-Oct: Status of services s 1 : OK, s 2 : DOWN ◦ 01-Oct: Availability: 100%, Reliability: 100% ◦ 02-Oct: s 1 decomissioned ◦ 05-Oct: Availabilities recomputation for 01-Oct Old GridView engine: Availability 0%, Reliability: 0% ACE: Availability: 100%, Reliability: 100%
Usage of topology provider Old GridView engine: ◦ Topology taken from several sources ACE: ◦ Topology taken from single Aggregated Topology Provider (ATP) Flexible Availability Computation Engine for WLCG
Visualization Present gridview interface adapted to display ACE metrics Work in progress to display the ACE Status and availability metrics in the new visualization portal Flexible Availability Computation Engine for WLCG
Future work Integration with the new visualization portal Dynamic availability and reliability recomputations Graphical interface for defining algorithms Flexible Availability Computation Engine for WLCG
Summary ACE satisifies the requirements of the LHC experiments Flexible Availability Computation Engine for WLCG
Links WEB/Home WEB/Home Contact us: Flexible Availability Computation Engine for WLCG
Acknowledgments Thank you to the GridView Team for their excellent work on the project: ◦ Rajesh Kalmady ◦ Phool Chand ◦ Vaibhav Kumar ◦ Digamber Sonvane ◦ Pradyumna Joshi ◦ Vibhuti Duggal ◦ Kislay Bhatt Flexible Availability Computation Engine for WLCG