EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference, September 2009, Barcelona Regional Grid Monitoring Introduction & database components
Enabling Grids for E-sciencE EGEE-III INFSO-RI Outline Introduction to the new Service Availability Monitoring System Description of the Database Components –Aggregated Topology Provider (ATP) –Metric Description Database (MDDB) –Metric Results Store (Metric Store) 2
Enabling Grids for E-sciencE EGEE-III INFSO-RI Outline Introduction to the new Service Availability Monitoring System Description of the Database Components –Aggregated Topology Provider (ATP) –Metric Description Database (MDDB) –Metric Results Store (Metric Store) 3
Enabling Grids for E-sciencE EGEE-III INFSO-RI SAM – existing architecture 4
Enabling Grids for E-sciencE EGEE-III INFSO-RI SAM - enhanced architecture 5
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 6
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 7
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 8
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 9
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 10
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 11
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 12
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 13
Enabling Grids for E-sciencE EGEE-III INFSO-RI Data Flow 14
Enabling Grids for E-sciencE EGEE-III INFSO-RI MyEGEE portal & iGoogle 15
Enabling Grids for E-sciencE EGEE-III INFSO-RI Outline Introduction to the new Service Availability Monitoring System Description of the Database Components –Aggregated Topology Provider (ATP) –Metric Description Database (MDDB) –Metric Results Store (Metric Store) 16
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases - ATP 17 What will be tested? ? ? How it will be tested? What to do with test results? ?
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases - ATP 18 What will be tested? ? ? How it will be tested? What to do with test results? Aggregated Topology Provider
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases - ATP What information is provided by the ATP? –Topology information containing: Projects (WLCG) and grid infrastructures (EGEE, OSG, NDGF) Sites, Services, VOs and their groupings Downtimes A history of the above Why do we need it? –For availability re-calculations, history of grid topology is needed –We couldn’t name groups of arbitrary grid resources (e.g. ATLAS clouds) –Single authoritative information source with topology information 19
Enabling Grids for E-sciencE EGEE-III INFSO-RI ATP - why do we need it? 20 Current flow of Grid topology data across various monitoring tools:
Enabling Grids for E-sciencE EGEE-III INFSO-RI ATP - why do we need it? 21 Streamlined grid topology data flow using the ATP:
Enabling Grids for E-sciencE EGEE-III INFSO-RI ATP – data sources 22 BDII OSG IM GOCDB CIC Portal ATP sync OSG topology & downtimes EGEE topology & downtimes Installed capacity VO cards Aggregated Topology Provider Gstat 2.0 VO / service mappings Alice Voboxes WLCG MOU Portal Project feeds VO feeds
Enabling Grids for E-sciencE EGEE-III INFSO-RI ATP – status What do we have today? –MySQL and Oracle version –Synchronizer –A programmatic interface to retrieve ATP information (XML/JSON): 23
Enabling Grids for E-sciencE EGEE-III INFSO-RI ATP – status What needs to be added? –History tables to record changes in topology information –Programmatic Interface - parameterised queries (similar to SAM PI) 24
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases 25 What will be tested? ? ? How it will be tested? What to do with test results? Aggregated Topology Provider
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases - MDDB 26 What will be tested? ? How it will be tested? What to do with test results? Aggregated Topology Provider Metric Description Database
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases - MDDB What information is provided the MDDB? –Metrics which are used to test Grid infrastructure –Profiles – combination of metrics for computation of different availabilities and configuration of Nagios installations Why do we need it? –More flexible availability calculations: Example: CMS would like to test Tier-1 and Tier-2 sites differently –Maintain a history of which metrics and calculations were valid at each point in time 27
Enabling Grids for E-sciencE EGEE-III INFSO-RI MDDB - Architecture 28 CENTRAL MDDB Local Cache MDDB Sync
Enabling Grids for E-sciencE EGEE-III INFSO-RI MDDB - Status What do we have today? –MySQL and Oracle version –Integration with ATP –Web User Interface –A programmatic interface to retrieve MDDB information (JSON) What needs to be added? –Synchronizer between Central DB and local (ROC) caches –Interface for populating and querying profiles –Profiles: Mapping with grid resources 29
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases 30 What will be tested? ? How it will be tested? What to do with test results? Aggregated Topology Provider Metric Description Database
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases – Metric Store 31 What will be tested? How it will be tested? What to do with test results? Aggregated Topology Provider Metric Description Database Metric Results Store
Enabling Grids for E-sciencE EGEE-III INFSO-RI Databases – Metric Store What information is provided by the Metric Store? –Metric results for service end-points for the grid infrastructure –Status changes for service end-points in the infrastructure What do we have today? –MySQL and Oracle versions: Integration with MDDB and ATP Per-service status change calculation for Profiles Data loader –Data from 11 ROCs is being loaded to Central Metric Store: Some of the records rejected (Mainly due to service end-points not defined correctly in GOCDB) 32
Enabling Grids for E-sciencE EGEE-III INFSO-RI Metric Store – status What needs to be added: –MySQL – tuning of DB (e.g. table partitioning) –Programmatic Interface - parameterised queries –Purging mechanism –Alerting mechanism integrated with Nagios (e.g. when not enough metric results received in given period of time) 33
Enabling Grids for E-sciencE EGEE-III INFSO-RI Central Metric Store Population 34 Active & Passive Checks Results Metric & Profile Definition Service Definition
Enabling Grids for E-sciencE EGEE-III INFSO-RI Outline Introduction to the new Service Availability Monitoring System Description of the Database Components –Aggregated Topology Provider (ATP) –Metric Description Database (MDDB) –Metric Results Store (Metric Store) Publicity 35
Enabling Grids for E-sciencE EGEE-III INFSO-RI Publicity - Demo Watch our demo and vote for it: –Tuesday 16:30-17:00 –Wednesday lunch – (YouTube) – 36
Enabling Grids for E-sciencE EGEE-III INFSO-RI Acknowledgments Thanks to the following people for their contributions: –James Casey (CERN) –Emir Imamagic (SRCE) –Pradyumna Joshi (BARC) –Rajesh Kalmady (BARC) –Vaibhav Kumar (BARC) –Steve Traylen (CERN) SAM Team at CERN: –John Shade –David Collados –Karolis Eigelis –Judit Novak –Konstantin Skaburskas 37
Enabling Grids for E-sciencE EGEE-III INFSO-RI Summary New enhanced SAM system, based on Nagios - a very popular powerful open-source tool, will: –Simplify transition to the EGI era –Help site administrators with fabric monitoring ATP, acting as a single authoritative information aggregator, will simplify the job of assimilating grid resource information MDDB will allow flexible availability calculations Metric Results Store will help MyEGEE portal in displaying of the test results. Demo: 38
Enabling Grids for E-sciencE EGEE-III INFSO-RI Thank you! 39 Questions? egee3-operations-automation-