Sergio Fantinel, INFN LNL/PD

Slides:



Advertisements
Similar presentations
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Advertisements

DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
GLUE Schema: conceptual model and implementation
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
TERENA 2003, May 21, Zagreb TERENA Networking Conference, 2003 MOBILE WORK ENVIRONMENT FOR GRID USERS. TESTBED Miroslaw Kupczyk Rafal.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
DataTAG Work Package 4 Meeting Bologna Simone Ludwig Brunel University 23rd and 24th of May 2002.
1 Grid monitoring System (GridICE) Grid monitoring System (GridICE) DataTAG Collaboration: S.Andreozzi, S.Fantinel, A.Ghiselli, G.Tortone, C.Vistoli A.Ghiselli,
DataTAG is a project funded by the European Union DataTAG WP4 meeting, Bologna 29/07/2003 – n o 1 GLUE Schema - Status Report DataTAG WP4 meeting Bologna,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 29 GridICE The eyes of the grid A monitoring tool for a Grid Operation Center.
EGEE is a project funded by the European Union under contract INFSO-RI Grid accounting with GridICE Sergio Fantinel, INFN LNL/PD LCG Workshop November.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
+ Support multiple virtual environment for Grid computing Dr. Lizhe Wang.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
Bob Jones – Project Architecture - 1 March n° 1 Project Architecture, Middleware and Delivery Schedule Bob Jones Technical Coordinator, WP12, CERN.
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Grid Monitoring and Diagnostic Tools: GridICE, GSTAT, SAM Giuseppe Misurelli INFN-CNAF giuseppe.misurelli cnaf.infn.it.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Gri2Win: Porting gLite to run under Windows XP Platform
Grid2Win Porting of gLite middleware to Windows XP platform
Job monitoring and accounting data visualization
The EDG Testbed Deployment Details
Classic Storage Element
U.S. ATLAS Grid Production Experience
Use of Nagios in Central European ROC
INFNGRID Monitoring Group report
gLite Information System(s)
Practical: The Information Systems
Brief overview on GridICE and Ticketing System
Monitoring: problems, solutions, experiences
The Information System in gLite
Grid2Win: Porting of gLite middleware to Windows XP platform
GridICE monitoring for the EGEE infrastructure
Short update on the latest gLite status
Conditions Data access using FroNTier Squid cache Server
EDT-WP4 monitoring group status report
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
Gri2Win: Porting gLite to run under Windows XP Platform
a VO-oriented perspective
A conceptual model of grid resources and services
gLite Information System(s)
Author: Laurence Field (CERN)
Report on GLUE activities 5th EU-DataGRID Conference
EGEE Middleware: gLite Information Systems (IS)
DGAS Today and tomorrow
gLite Information System
Servizi di Grid e impatto sulla rete
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

Sergio Fantinel, INFN LNL/PD GridICE The eyes of the grid A monitoring tool for a Grid Operation Center by DataTAG WP4 Sergio Fantinel, INFN LNL/PD

GridICE Actual Implemantation Outline Monitoring scenario Collection of info: EDG WP4 Fmon Framework & GLUE/GLUE+ Schema Discovery, info retrival and presentation service Real testbed experience and results Next steps

Monitoring scenario Different layers of info generation Different points of view Computing Element Storage Element Worker Node Resource Broker Information Index Replica Manager Replica Catalog […] SERVICE checks gatekeeper gsiftp gris gdmp RB/LB … “GRID/VO” measurements number of total CPUs number of free CPUs number of running jobs number of waiting jobs SE free disk space LOW LEVEL measurements CPU load memory usage disk usage (per partition) network activity number of processes number of users (UI) …

GLUE+

Central Monitoring Database EDG WP4 FMon Framework web interface ldap query GIIS (GLUE schema) First discovery phase Central Monitoring Database information index Second discovery phase ldap query monitoring server EDG-WP4 fmonserver GRIS (GLUE+ schema) cluster worker node /proc filesystem WP4 sensor run read metric output EDG-WP4 monitoring agent EDG-WP4 monitoring agent cluster worker node /proc filesystem WP4 sensor run read metric output write run ldif output information providers farm monitoring archive read cluster head node

Experiment Specific Measures Integration Possible and easy integration of VO/Experiment measures publication It must be modified the GLUE schema and write the experiment sensors (ex. CMS KIN/SIM event production) EDG-WP4 fmonserver GRIS (GLUE++ schema) cluster worker node /proc filesystem CMS sensor run read metric output EDG-WP4 monitoring agent EDG-WP4 monitoring agent cluster worker node /proc filesystem CMS sensor run read metric output write run ldif output information providers farm monitoring archive read cluster head node

Server Side service layout GRID WEB Discovery 1A 5B 1C GIIS 1B Gfx/Presentation Config 2A 2B GRIS 5A Monitoring DB scheduler 3 4A 4B Check 1: entities discovery 2: generation of config files 3: check scheduling 4: entities info collection 5: DB info rendering Grid Information System LDAP Interface Developed by DataTAG WP4

Discovery service: entities list This is the list of entities currently tracked by the monitoring system: Clusters Storage Services Worker Nodes (CL) Computing Elements (CL) Run Time Environments (CL) Virtual Organizations (CE) Storage Extents (WN) Network Adapters (WN) Storage Space (SE) Storage Protocols (SE) CL = Cluster WN = Worker Node/host SE = Storage Service

Data presentation service (2) The presentation of the date was made addressing different user types : Vo views, for a VO manager Site views, grid manager Single entity grid/site manager (see next slides)

Data presentation service (3)

Data presentation service (4)

Real testbed deployment First tests on the DataTAG tesbed -> march presentation at 1st DatatTAG review First wide deployment on the CMS/LCG-0 testbed (29/07/2003): 12 sites x 150CPUs bo.infn.it, cern.ch, cmsfarm1.ba.infn.it, cnaf.infn.it, hep.ph.ic.ac.uk, in2p3.fr, its.uiowa.edu, lnl.infn.it, mi.infn.it, pd.infn.it, phy.bris.ac.uk, phy.ncu.edu.tw From the beginning of july started on the Certification Testbed of LCG-1 A few days started on the Deployment Testbed of LCG-1

Results Good response from the CMS/LCG-0 testbed The only big problem we found on monitoring big sites is due to time (NTP) synchronization of the machines so not a problem of the GridICE tool Thanks to the collaboration of the LCG people we are improving significantly our monitoring tool

Next steps, short term Check plug-in refactoring: we made some tests with LDAP and to improve the performance we must aggregate the queries (less queries, more date to be transferred). Data reduction with the activation of the thresholds We are thinking to introduce some kind of caching for last data pushed in the DB to less stress the DB DB schema improvement: dynamic discovery of the URL GRIS (at the moment with GlueInformationServiceURL). Introduction of new components: CESEBind, SECEBind. Activation of the service (GRIS, GIIS, gridftp,…) checking

Next steps, short term (2) Grid Collective Service Monitoring (e.g. edg-broker, edg-replica-location-service) Job Monitoring at queue level (some open issues, ex. VO) Native R-GMA support as GIS: we need a working and stable testbed with R-GMA as GIS, extend the CE GIN to support the new metrics. Hosts Role (via GlueHostService) in order to associate service state to proper host state Support for multiple Information Index back-up Information Index