TS4.10 Comp Reports A new approach to Computing Availability/Reliability reports for EGI Progress Report C. Kanellopoulos GRNET 9/14/2018
Current Situation NGI Site A/R reports are delivered monthly Computations are performed on a centralized infrastructure Current implementation is more or less a closed solution EGI Operations cannot interact via a direct interface Re-computations etc are handled via GGUS (SLM unit) VO & NGI Core Services Reports are generated by the Ops Portal
New A/R reporting service Proposal New A/R reporting service Open source solution Include extensions for VO-wide metrics (in addition to service-wise, site-wise and NGI-wise) Direct interface for SLM Units and EGI Operations (via API) Computations performed under profiles Deliver/Query results via front-end module
Overview
Initial goal: Replicate current ACE functionality Demo at the EGI User Forum: Retrieve monitoring data from the Brokers Calculate A/R for Sites Calculate A/R for NGI Core Services & VOs (Lavoisier) Distribute A/R results through Lavoisier Perform re-calculcations
Consumer service has been developed Data Acquisition Consumer service has been developed Listens on a configurable set of message queues Initial message level filtering Supports multiple backends for storing the monitoring data Default backend is the filesystems to that it can be fed to Hadoop
Poem Retrieval Service A service has been developed that downloads the latest profiles (once per day) A POEM profile can change in time, but changes are not very often Need to be able to track history of the POEM profiles
Retrieve topology on a daily basis Topology Retrieval Retrieve topology on a daily basis Topology can change at any time multiple times per month ACE currently uses the current topology at the time of computation We will certainly need to keep topology information for the current and previous month (for re-calculation purposes) Do we need a longer period of retention?
Status Computation Engine Built on top of PIG Status of service endpoint is computed per day on 24h time slices (hourly status) Currently POEM profiles are used for aggregating metric results into status BUT engine is extendable to include other profiles at this stage as well Engine allows also for multiple status results per profile
A/R Computation Engine Built on top of HDFS & Hive (Hadoop SQL interface) WIP: Service Flavor and Site Availability based on specific topology retrieved from GOCDB Daily history of topology is to be kept so recalculations (i.e. for the previous month) will be performed based on the existing topology at that time
Computation Engine
Computation Engine Future Work: VO and NGI Core Services A/R Support of custom high level profiles
API Expose A/R Calculations via tha API Expose “Re-calculation” functionality via the API Work has not started yet
User Interface built on top of the Lavoisier engine Currently in operation providing VO and NGI Core Services A/R Reporting - Selection of the availability type (vo , ngi , sites) - Selection of the service group (CE , SE , TOP BDII ,ALL ) - Selection of a period then selection of the granularity(monthly , daily , hourly) . - Possibility to export in xls and pdf and different charts