SEE-GRID-2 The SEE-GRID-2 initiative is co-funded by the European Commission under the FP6 Research Infrastructures contract no SEE-GRID operational tools and Grid services improvements Antun Balaz WP3 Leader Institute of Physics, Belgrade EGEE/WLCG Operations Workshop 2007, Stockholm, June 2007
EGEE/WLCG Operations Workshop 2007, Stockholm, June Overview SEE-GRID WP3 Infrastructure Operations Operational tools HGSM, HGSM+SAM integration WiatG BBmSAM, BBmobileSAM WP3 ongoing work
EGEE/WLCG Operations Workshop 2007, Stockholm, June SEE-GRID WP3 Develop the next-generation SEE-GRID infrastructure Next generation of EGEE middleware (gLite) and services Support in deployment and operations of the Resource Centres Monitoring, helpdesk, overall upgrade of infrastructure Network resource provision and assurance in close cooperation with the SEEREN2 project Bandwidth-on-Demand requirements CA and RA guidelines and deployment catch-all Certification Authority (CA) per-country CA deployment and User portal deployment and operations P-GRADE
EGEE/WLCG Operations Workshop 2007, Stockholm, June Infrastructure
EGEE/WLCG Operations Workshop 2007, Stockholm, June Infrastructure status (1) SEE-GRID Core services Catch-all Certification Authority enables regional sites to obtain user and host certificates Virtual Organisation Management Service (VOMS), authorization system for the SEE-GRID Virtual Organisation (VO), supporting groups and roles Workload management service (lcg-RB and glite-WMSLB) and Information Services (BDII) deployed several instances for failover MyProxy is operational supports certificate renewal FTS deployed used in production
EGEE/WLCG Operations Workshop 2007, Stockholm, June Infrastructure status (2) SEE-GRID infrastructure contains currently the following resources: 31 sites in SEE-GRID production 5 sites in certification phase (AL + HR + 2 RO) CPUs: ~950 total; Storage: TB gLite assessment done, results positive, upgrade done on all sites (GLITE-3_0_2) glite-CE deployed at several sites, assessment results inconclusive, service probably not stable enough for production glite-WMSLB deployed at several sites, assessment results show that it is not so stable as lcg-RB, but has various new features and is therefore actively used WN deployment closely follows latest developments of gLite:
EGEE/WLCG Operations Workshop 2007, Stockholm, June Operations
EGEE/WLCG Operations Workshop 2007, Stockholm, June Operational procedures Distributed operations Pilot SLA established Monitoring and Accounting Tools Helpdesk tickets procedures Generic support group for users TPM-like (monitoring open tickets created by users, trying to solve the simple ones, route the tickets, etc.). Country level user support groups Associate with country level mailboxes GOOD shifts introduced, initial results positive Tickets handling: response times need to be improved! SEEGRID Wiki with detailed information for site administrators
EGEE/WLCG Operations Workshop 2007, Stockholm, June SLA Conformance Improvements seen after the first quarter of pilot SLA enforcement
EGEE/WLCG Operations Workshop 2007, Stockholm, June Operational & monitoring tools (1) Operational & monitoring tools deployment status Hierarchical Grid Site Management (HGSM) – Turkey Service Availability Monitoring (SAM) (+ porting to MySQL) – Bosnia and Herzegovina with CERN support Helpdesk - Romania BBmSAM - Bosnia and Herzegovina GridICE – FYR of Macedonia SEE-GRID GoogleEarth – Turkey + Gidoon Moont SEE-GRID GoogleMaps - Turkey Global Grid Information Monitoring System (GStat) – Min-Hong Tsai Relational Grid Monitoring Architecture (R-GMA) – Bulgaria Nagios - Bulgaria Real Time Monitor (RTM) – Gidoon Moont and Turkey (HGSM) MONitoring Agents using a Large Integrated Services Architecture (MonALISA) – Romania What is at the Grid (WiatG) – CERN with support from Serbia
EGEE/WLCG Operations Workshop 2007, Stockholm, June Operational & monitoring tools map HGSM HELP-DESK BDII R-GMA SAM GSTAT (Taiwan) GSTAT (Taiwan) VOMS RTM (UK) RTM (UK) Google maps Google maps BBmSAM GridICE MonALISA NAGIOS WiatG
EGEE/WLCG Operations Workshop 2007, Stockholm, June Operational & monitoring tools (2) Integration status HGSM+SAM, HGSM+BBmSAM Automatic creation of list of sites to be tested HGSM+BDII Automatic creation of list of sites in the infrastructure HGSM+GStat Automatic creation of list of sites to be monitored HGSM+RTM, HGSM+R-GMA Automatic creation of list of sites monitoring and for accounting VOMS+Helpdesk Automatically create new user accounts when accessing helpdesk Certificate based access to Helpdesk HGSM HELP-DESK BDII R-GMA SAM GSTAT VOMS RTM Google maps Google maps BBmSAM
EGEE/WLCG Operations Workshop 2007, Stockholm, June HGSM database SEE-GRID GOCDB Introduced as a lightweight version of GOCDB Allows us to easily change its format when necessary and to adapt it to regional needs Allows us to provide custom exports on demand, depending on operational tools/application developers Contains statical information about all sites Developed and maintained by TUBITAK-ULAKBIM, Turkey Used by EUMedGRID, other regional projects expressed interest
EGEE/WLCG Operations Workshop 2007, Stockholm, June HGSM+SAM integration has been done in collaboration between TUBITAK-ULAKBIM and U of Banjaluka Periodical export of HGSM data to XML file XML if full dump of database and represents all relevant tables Generated data is universal and can be used for other purposes Periodical import of HGSM data first to local MySQL DB then to Oracle XE SAM DB Only SAM relevant data is imported into Oracle Other data resides in local MySQL DB if needed for other use and not to burden Oracle DB HGSM+SAM Integration (1)
EGEE/WLCG Operations Workshop 2007, Stockholm, June HGSM+SAM Integration (2) HGSM (MySQL) XML (PHP) Local copy of HGSM (MySQL) SAM DB (Oracle) BBmSAM (PHP) SAM portal (Python) BBmSAM (PHP) SAM sync (PHP)
EGEE/WLCG Operations Workshop 2007, Stockholm, June HGSM – SAM – planned improvements Currently SAM retrieves node/service from mix of different sources (the “official” way) All the data is already present in HGSM The intention is to communicate directly and only with HGSM as it is considered to be reference copy for data Having HGSM DB copy at the same place enables us to further develop (BBm)SAM portal Checking whether someone is site administrator and allowing him/her to request out-of-order tests Soft real-time tracking of test progress Exporting data in any structured form – moving to XML and/or HGSM+SAM Integration (3)
EGEE/WLCG Operations Workshop 2007, Stockholm, June WiatG: New BDII operations tool Web application for visualization of BDII information Highly responsive tool because it uses AJAX Partial refresh (client receives part by part of the page) Asynchronous (server processing in the background, so one may send several requests) Current version seeks for: CE, gCE, RB, gRB, SE, LFC, FTS and GridICE Used as an operational tool for site monitoring Documentation available: Supports several regional projects: EUMedGRID, EUChinaGrid, EELA, and BalticGrid, as well as LHC VOs and OPS
EGEE/WLCG Operations Workshop 2007, Stockholm, June WiatG Architecture
EGEE/WLCG Operations Workshop 2007, Stockholm, June WiatG in action
EGEE/WLCG Operations Workshop 2007, Stockholm, June Further development of WiatG Addition of new services (MyProxy, localLFC, VO software tags, …) Correctness check of site-BDII data Alarms dashboard Automatic creation of tickets Development of the new tool “What should be at the Grid” (WsbatG) Based on the site configuration exported from HGSM (SEE-GRID GOCDB) Visually identical tool, providing the expected status of BDII in WiatG Comparison of WiatG and WsbatG data Alarms dashboard Automatic creation of tickets
EGEE/WLCG Operations Workshop 2007, Stockholm, June BBmSAM portal Created for SLA monitoring Generating site availability statistics according to several criteria Overview (HTML) and full dump (CSV) of data possible Extended into full SAM portal Availability for last 24h period for all sites/services Latest results per service History for nodes/services Currently being ported to MySQL Developed by U of Banjaluka BBmSAM
EGEE/WLCG Operations Workshop 2007, Stockholm, June BBmSAM as a SAM portal (1)
EGEE/WLCG Operations Workshop 2007, Stockholm, June BBmSAM as a SAM portal (2)
EGEE/WLCG Operations Workshop 2007, Stockholm, June BBmSAM and SLA (1)
EGEE/WLCG Operations Workshop 2007, Stockholm, June BBmSAM and SLA (2)
EGEE/WLCG Operations Workshop 2007, Stockholm, June BBmobileSAM Optimized for small-screen devices and low bandwidth Possible filtering of sites For a single site (example: BA-01-ETFBL) For all sites in a country (example: BA) For all SEE-GRID sites Possible three levels of details Basic level (critical test status for all nodes and services) Single test level (all tests status for all nodes and services) Single test level with timestamp Detail levels work independently of site filter, which means that will produce detailed results for all sites in SEE-GRID
EGEE/WLCG Operations Workshop 2007, Stockholm, June WP3 ongoing work Optimization of site/top-level BDIIs through indexing SAM porting to MySQL WiatG/WsbatG HGSM improvements gLite-WMSLB performance and stability assessment Proxy renewal on RB/WMS with full VOMS capabilities