Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.

Similar presentations


Presentation on theme: "Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET."— Presentation transcript:

1 Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET

2 Advancements in Availability and Reliability computation
Introduction and current status of the Comp Reports mini project Results of the Requirements Gathering Task Force Demo Roadmap until the end of the project Open discussion

3 Introduction and current status of the Comp Reports mini project
A/R reports in EGI-InSPIRE A new A/R reporting service Mini project information Technical Part High level overview of the architecture Functionality and implementation of the pilot

4 Current Situation NGI Site A/R reports are delivered monthly
Computations are performed on the ACE centralized infrastructure Current implementation is based on commercial software EGI Operations cannot interact via a direct interface Re-computations etc are handled via GGUS (SLM unit) VO & NGI Core Services Reports are generated by the Ops Portal

5 New A/R reporting service
Open source solution Include extensions for VO-wide metrics (in addition to service-wise, site-wise and NGI-wise) Direct interface for SLM Units and EGI Operations (via API) Computations performed under profiles Deliver/Query results via front-end module

6 Mini project info Duration of the mini project: 12 months 3 Partners: CNRS, GRNET, SRCE Total available effort: ~16PM Initial goal of the project: Open source alternative of ACE

7 Goal for EGI Technical Forum
Replicate current ACE functionality Retrieve POEM profiles from POEM Service Retrieve monitoring data from the Brokers Retrieve topology information from GOCDB Retrieve HEPSPEC values Retrieve downtime information from GOCDB Prefilter raw monitoring data Calculate A/R for Sites & NGIs Calculate A/R for NGI Core Services & VOs (Lavoisier) Provide A/R API for integration with Lavoisier Distribute A/R results through Lavoisier

8 Goal for EGI Technical Forum
Replicate current ACE functionality Retrieve POEM profiles from POEM Service Retrieve monitoring data from the Brokers Retrieve topology information from GOCDB Retrieve HEPSPEC values Retrieve downtime information from GOCDB Prefilter raw monitoring data Calculate A/R for Sites & NGIs Calculate A/R for NGI Core Services & VOs (Lavoisier) Provide A/R API for integration with Lavoisier Distribute A/R results through Lavoisier Done Done Done Done Done Done Done Done Done Done

9 Current Architecture

10 Data Retrieval Listens on a configurable set of message queues
Runs as a system service Provides Initial message level filtering Supports multiple backends for storing the monitoring data Default backend is the filesystem to that it can be fed to Hadoop/HDFS on top of which the computation engine runs

11 Data Retrieval Retrieves the POEM profiles from the central Gridmon service Provide information about Service, Profile, Service Flavor, Metric, NGI, VO, FQAN POEM profiles should not be confused with the (algorithmic) profiles used for calculating site availability Supports definition of SAM server, NGI, Profile tuples A POEM profile can change in time, but changes are not very often Need to be able to track history Runs as a scheduled task (once per day)

12 Data Retrieval Retrieves topology daily
Topology can change at any time multiple times per month ACE uses the current topology at the time of computation We keep topology information for the current and previous month (for re-calculation purposes) Do we need a longer period of retention?

13 Data Retrieval Downloads downtimes from GOCDB Downtimes change often
Runs before each A/R computation is performed Downtime is used both the reliability and the availability computations

14 Data Prefiltering Pre-filters and validates data retrieved by the Log Consumer For each metric result it check the SAM instance, the NGI and the profile The cleaned-up data for each metric result are in in the form: Timestamp, metric, type, host, status, VO, FQAN, profile Runs before each A/R computation is performed

15 Computation Engine Built on top of Hadoop PIG
Status of Service Endpoints computed per day on 24h time slices (hourly status) for each POEM profile POEM profiles are used for aggregating metric results into status Calculating SE status for 48 hours back Reason for this is that monitoring messages can be seen on the Message Broker Network up to 24 hours after the execution time of the monitoring probes

16 Computation Engine Service Flavors Availability & Reliability
computed per day on 24h time slices (hourly status) Availability profiles are used for aggregating SE status results into SF status engine is extensible to include other profiles at this stage as well Calculating SF status for 48 hours back As stated in the previous slide reason for this is that monitoring messages can be seen on the Message Broker Network up to 24 hours after the execution time of the monitoring probes

17 Computation Engine Availability & Reliability of NGI Sites
computed daily Availabilty profiles are used for aggregating SF A/R results into Site A/R engine is extensible to include other profiles at this stage as well Calculating Site A/R for 48 hours back As stated in the previous slide reason for this is that monitoring messages can be seen on the Message Broker Network up to 24 hours after the execution time of the monitoring probes Full month results on the 3rd day of the next month

18 Uses mongodb as the data store Implemented two method calls:
API Uses mongodb as the data store Implemented two method calls: site_availability_in_profile ngi_availability_in_profile Serves data in XML and JSON

19 User Interface built on top of the Lavoisier engine
Currently in operation providing VO and NGI Core Services A/R Reporting - Selection of the availability type (vo , ngi , sites) - Selection of the service group (CE , SE , TOP BDII ,ALL ) - Selection of a period then selection of the granularity(monthly , daily , hourly) . - Possibility to export in xls and pdf and different charts

20 Pilot Service The pilot service is deployed on the ~okeanos cloud infrastructure Hadoop infrastructure in place Calculation for 1 month: ~2h Can go down to ~10’

21 Direction driven by the Requirements Gathering Task Force
Remaining work Direction driven by the Requirements Gathering Task Force Custom Availability Profiles Availability / Reliability for Vos Custom factors in the A/R calculations Data Retention Re-calculations Graph visualization Further information in the following presentations

22 Thank you!


Download ppt "Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET."

Similar presentations


Ads by Google