Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sergio Fantinel, INFN LNL/PD

Similar presentations


Presentation on theme: "Sergio Fantinel, INFN LNL/PD"— Presentation transcript:

1 Sergio Fantinel, INFN LNL/PD
GridICE The eyes of the grid A monitoring tool for a Grid Operation Center by DataTAG WP4 Sergio Fantinel, INFN LNL/PD

2 GridICE Actual Implemantation Outline
Monitoring scenario Collection of info: EDG WP4 Fmon Framework & GLUE/GLUE+ Schema Discovery, info retrival and presentation service Real testbed experience and results Next steps

3 Monitoring scenario Different layers of info generation
Different points of view Computing Element Storage Element Worker Node Resource Broker Information Index Replica Manager Replica Catalog […] SERVICE checks gatekeeper gsiftp gris gdmp RB/LB “GRID/VO” measurements number of total CPUs number of free CPUs number of running jobs number of waiting jobs SE free disk space LOW LEVEL measurements CPU load memory usage disk usage (per partition) network activity number of processes number of users (UI)

4 GLUE+

5 Central Monitoring Database
EDG WP4 FMon Framework web interface ldap query GIIS (GLUE schema) First discovery phase Central Monitoring Database information index Second discovery phase ldap query monitoring server EDG-WP4 fmonserver GRIS (GLUE+ schema) cluster worker node /proc filesystem WP4 sensor run read metric output EDG-WP4 monitoring agent EDG-WP4 monitoring agent cluster worker node /proc filesystem WP4 sensor run read metric output write run ldif output information providers farm monitoring archive read cluster head node

6 Experiment Specific Measures Integration
Possible and easy integration of VO/Experiment measures publication It must be modified the GLUE schema and write the experiment sensors (ex. CMS KIN/SIM event production) EDG-WP4 fmonserver GRIS (GLUE++ schema) cluster worker node /proc filesystem CMS sensor run read metric output EDG-WP4 monitoring agent EDG-WP4 monitoring agent cluster worker node /proc filesystem CMS sensor run read metric output write run ldif output information providers farm monitoring archive read cluster head node

7 Server Side service layout
GRID WEB Discovery 1A 5B 1C GIIS 1B Gfx/Presentation Config 2A 2B GRIS 5A Monitoring DB scheduler 3 4A 4B Check 1: entities discovery 2: generation of config files 3: check scheduling 4: entities info collection 5: DB info rendering Grid Information System LDAP Interface Developed by DataTAG WP4

8 Discovery service: entities list
This is the list of entities currently tracked by the monitoring system: Clusters Storage Services Worker Nodes (CL) Computing Elements (CL) Run Time Environments (CL) Virtual Organizations (CE) Storage Extents (WN) Network Adapters (WN) Storage Space (SE) Storage Protocols (SE) CL = Cluster WN = Worker Node/host SE = Storage Service

9 Data presentation service (2)
The presentation of the date was made addressing different user types : Vo views, for a VO manager Site views, grid manager Single entity grid/site manager (see next slides)

10 Data presentation service (3)

11 Data presentation service (4)

12 Real testbed deployment
First tests on the DataTAG tesbed -> march presentation at 1st DatatTAG review First wide deployment on the CMS/LCG-0 testbed (29/07/2003): 12 sites x 150CPUs bo.infn.it, cern.ch, cmsfarm1.ba.infn.it, cnaf.infn.it, hep.ph.ic.ac.uk, in2p3.fr, its.uiowa.edu, lnl.infn.it, mi.infn.it, pd.infn.it, phy.bris.ac.uk, phy.ncu.edu.tw From the beginning of july started on the Certification Testbed of LCG-1 A few days started on the Deployment Testbed of LCG-1

13 Results Good response from the CMS/LCG-0 testbed
The only big problem we found on monitoring big sites is due to time (NTP) synchronization of the machines so not a problem of the GridICE tool Thanks to the collaboration of the LCG people we are improving significantly our monitoring tool

14 Next steps, short term Check plug-in refactoring: we made some tests with LDAP and to improve the performance we must aggregate the queries (less queries, more date to be transferred). Data reduction with the activation of the thresholds We are thinking to introduce some kind of caching for last data pushed in the DB to less stress the DB DB schema improvement: dynamic discovery of the URL GRIS (at the moment with GlueInformationServiceURL). Introduction of new components: CESEBind, SECEBind. Activation of the service (GRIS, GIIS, gridftp,…) checking

15 Next steps, short term (2)
Grid Collective Service Monitoring (e.g. edg-broker, edg-replica-location-service) Job Monitoring at queue level (some open issues, ex. VO) Native R-GMA support as GIS: we need a working and stable testbed with R-GMA as GIS, extend the CE GIN to support the new metrics. Hosts Role (via GlueHostService) in order to associate service state to proper host state Support for multiple Information Index back-up Information Index


Download ppt "Sergio Fantinel, INFN LNL/PD"

Similar presentations


Ads by Google