DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 29 GridICE The eyes of the grid A monitoring tool for a Grid Operation Center.

Slides:



Advertisements
Similar presentations
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Advertisements

FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
AGENDA Tools used in SQL Server 2000 Graphical BOL Enterprise Manager Service Manager CLI Query Analyzer OSQL BCP.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
WP3 RGMA Deployment Laurence Field / RAL Steve Fisher / RAL.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Information System (IS) Valeria Ardizzone.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
Computational grids and grids projects DSS,
A.Guarise – F.Rosso 1 Enabling Grids for E-sciencE INFSO-RI Comprehensive Accounting Views on large computing farms. Andrea Guarise & Felice Rosso.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
GLite Information System(s) Antonio Juan Rubio Montero CIEMAT 10 th EELA Tutorial. Madrid, May 7 th -11 th,2007.
Kurt Mueller San Diego Supercomputer Center NPACI HotPage Updates.
ATLAS Detector Description Database Vakho Tsulaia University of Pittsburgh 3D workshop, CERN 14-Dec-2004.
Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.
EGEE is a project funded by the European Union under contract INFSO-RI Copyright (c) Members of the EGEE Collaboration GLUE Schema Sergio.
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
LCG Accounting John Gordon Grid Deployment Board 13 th January 2004.
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
DataTAG Work Package 4 Meeting Bologna Simone Ludwig Brunel University 23rd and 24th of May 2002.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
DataTAG is a project funded by the European Union DataTAG WP4 meeting, Bologna 29/07/2003 – n o 1 GLUE Schema - Status Report DataTAG WP4 meeting Bologna,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
E-infrastructure shared between Europe and Latin America gLite Information System(s) Manuel Rubio del Solar CETA-CIEMAT EELA Tutorial, Mérida,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE is a project funded by the European Union under contract INFSO-RI Grid accounting with GridICE Sergio Fantinel, INFN LNL/PD LCG Workshop November.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
EGEE is a project funded by the European Union under contract IST Installation and configuration of gLite services Robert Harakaly, CERN,
EGEE is a project funded by the European Union under contract IST Experiment Software Installation toolkit on LCG-2
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
StratusLab is co-funded by the European Community’s Seventh Framework Programme (Capacities) Grant Agreement INFSO-RI Demonstration StratusLab First.
II EGEE conference Den Haag November, ROC-CIC status in Italy
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
Grid Monitoring and Diagnostic Tools: GridICE, GSTAT, SAM Giuseppe Misurelli INFN-CNAF giuseppe.misurelli cnaf.infn.it.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Federating Data in the ALICE Experiment
Massimo Sgaravatto INFN Padova
INFNGRID Monitoring Group report
gLite Information System(s)
BDII Performance Tests
Sergio Fantinel, INFN LNL/PD
Ruslan Fomkin and Tore Risch Uppsala DataBase Laboratory
EDT-WP4 monitoring group status report
a VO-oriented perspective
gLite Information System(s)
DGAS Today and tomorrow
The EU DataGrid Fabric Management Services
gLite Information System
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 29 GridICE The eyes of the grid A monitoring tool for a Grid Operation Center by DataTAG WP4 Sergio Fantinel, INFN LNL/PD

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 2 / 29 Sergio Fantinel’s Outline Discovery: resources, components Server side processes layout DB schema Graphs/data presentation Next steps Packaging/installation

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 3 / 29 Discovery entities RESOURCES: are the entities discovered from the GIS, ex: Cluster Head Nodes Storage Services Cluster Worker Nodes COMPONENTS: are the entities belonging to resources and discovered directly from resource itself, ex: Computing Elements Storage Space Network Adapters

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 4 / 29 Discovery purposes track the life of the entities: they are characterized by a status (new, available, disappeared, dead). Configure the monitoring system accordingly the status of these entities to collect metrics, status and other info.

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 5 / 29 Discovery entities list This is the list of entities currently tracked by the monitoring system: Clusters Storage Services Worker Nodes (CL) Computing Elements (CL) Run Time Environments (CL) Virtual Organizations (CE) Storage Extents (WN) Network Adapters (WN) Storage Space (SE) Storage Protocols (SE) CL = Cluster WN = Worker Node/host SE= Storage Service

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 6 / 29 Discovery entities (2) Every entity (resource or component) is described by a number of characterizing information. Entities may be linked together: Ex. Network Adapter -> Worker Node -> Cluster To track the life of the entities it is used a SQL database where are stored also all the information related to every single entity.

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 7 / 29 Discovery entities (3) Monitoring DB Monitorig Server GIIS GRIS GIIS Server Computing Element/ Storage Element SQL 1: LDAP Query 2: available CE/SE 3: LDAP Query 4: CEIDs, WNs, Steps 3,4 repeated for every CE/SE LDAP

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 8 / 29 Discovery config/check processes All the info collected are used to generate a number of Nagios configuration files (configuration process). Nagios schedule, according to some other DB stored parameters, at different interval times, the execution of a number of scripts (Nagios plug-ins wrote by the DataTAG WP4) that collect the info associated to every entity (check process) and put those in the DB.

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 9 / 29 Server Side processes layout Discovery Config Nagios/scheduler Check Monitoring DB GIIS GRIS GRID 1A 1B 1C 2A 2B 34A 4B Gfx/Presentation WEB 5A 5B Developed by DataTAG WP4 1: entities discovery 2: generation of config files 3: check scheduling 4: entities info collection 5: DB info rendering Grid Information System LDAP Interface

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 10 / 29 Scheduling Discovery and config generation run as cron jobs; although the two processes can be scheduled independently at different time intervals, a discovery is just followed by a config generation. Check plug-ins are scheduled by Nagios; the interval for each one is set by a corresponding parameter in the DB.

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 11 / 29 DataBase stored info Three types of information are stored in the database (~50 tables): Entities: actual status, historical status (fed by discovery process ) Info about entities (fed by check process) Monitoring configuration parameters (fed manually by monitoring administrator)

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 12 / 29 DataBase date/time attributes Many record types has associated two date/time attributes (Date, LastCheck) that determines the time interval where the record is valid. Checking the time distance between the Date attribute of a tuple and the LastCheck of the tuple that immediately precede, we can figure out problems of the resource or the monitoring system (VETOES generation, see later).

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 13 / 29 DataBase (2) For many entities the related info are grouped into two sets: slow, fast parameters; in the first group are those params that change very slowly in the time (ex. SO version, RAM size of a node, bogoMIPS,…); in the second are all the other parameters.

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 14 / 29 DataBase check process For a set of parameters: the check process get the info from the GRIS, then compare the values against the last record in the DB and, if no one has changed, it update only the LastCheck attribute of the tuple; otherwise it insert a new tuple with the current date/time in Date/LastCheck attributes and the new info.

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 15 / 29 DataBase thresholds We are introducing a threshold mechanism: For a selected set of attributes, nearly all the fast parameters, we define a threshold If the new values for a tuple doesn’t exceed the related threshold over the old values, we change only the LastCheck attribute

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 16 / 29 DataBase thresholds (2) This threshold mechanism become very useful to save DB space resource allocation. For example We doesn’t need to appreciate a variation of a CPU load less the 5- 10% (think to the fluctuations around the 100% idle when no job is running on a CPU) The same concept may be applied to a Storage Area with a 5-10MB threshold

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 17 / 29 DB Example: GE simple info RESOURCE Date LastCheck IP ResName ID_GeType Group Cluster Date Data string(16) string(64) Int string(64) int RL_GE_CPU_SLOWPARAM Date LastCheck Vendor Model Version ClockSpeed TimeInterval date string(32) string(64) string(32) int RL_GE_CPU_FASTPARAM Date LastCheck Load1Min Load5Min Load15Min date int User Nice System Idle int NB: this is a simplified structure

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 18 / 29 RL_GE_CPU_FASTPARAM Date LastCheck FreeCpus EstimatedResponseTime RunningJobs WaitingJobs WorstResponseTime date int DB Example: queues RESOURCE RL_CE_CEID Date LastCheck CeidName date string(256) RL_GE_CPU_SLOWPARAM Date LastCheck CeidStatus TotalCpus MaxCpuTime MaxRunningJobs date int Int int MaxTotalJobs MaxWallClockTime Priority LRMSType LRMSVersion int string(16) NB: this is a simplified structure

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 19 / 29 RL_SA_FASTPARAM DB Example: status history RESOURCE RL_SE_SA RL_SA_SLOWPARAM NB: this is a simplified structure RL_SA_CHANGE_STATUS ID_ChgStatus Date int date RL_RES_CHANGE_STATUS ID_ChgStatus Date int date

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 20 / 29 GETYPE DB Example: configuration RL_GETYPE_SERVICE NB: this is a simplified structure SERIVCE ServiceName CheckInterval Template Command string(64) Int string(32) Service List: CheckCeidFP CheckCeidSP CheckSeFP CheckSeSP CheckWnCpu CheckWnCtxc CheckWnInfo CheckWnIntc CheckWnMem CheckWnMpc CheckWnNet CheckWnPrcc CheckWnSckc CheckWnSpc CheckWnStg CheckWnVirtual

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 21 / 29 Data presentation We based our work on the JpGraph tool (PHPv4), a very powerful set of API to generate a number of graph types that use directly the GDLib API (v2 is needed for an accurate colour management) To overcame some limitation in the managing of date/time labels, merging and plotting of large set of date of JpGraph, and to have in general a fine control, we developed a decoupling layer between the DB and the render engine.

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 22 / 29 Data presentation (2) Monitoring DB JpGraph GDLib Data Load Resample Data Merging Graph generation Developed by DataTAG WP4 Main Analysis Process

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 23 / 29 Data presentation (3) Monitoring DB Data Load Time interval Object descriptors (Table, fields,…) Veto thresholdArray of homogeneous tuples with their time interval validity (ALV) Array of VETOES (if option enabled) (AVT)

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 24 / 29 Data presentation (4) Resample Time interval (ALV) (AVT) X-axis plot (dot) size (xSize) Array with xSize number of elements; Every element is a record of metrics values; each value is the average of the metric function in the time interval slot represented by the xpoint considered Graph Labels Graph VETO bars (if enabled)

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 25 / 29 Data presentation (5) The presentation of the date was made addressing different user types (see later on the live demo session): Vo views, for a VO manager Site views, system manager Single entity views

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 26 / 29 Next steps, integration the next HIGH PRIORITY step is the integration with the EDG 2.0 information system. Due to the structure of our monitoring system, the migration should be very smooth and fast.

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 27 / 29 Next steps, long term Glue Network Service (NE) Job monitoring Collective services (RB, LB, ROS, RLS,…) We are thinking to evaluate different technologies that can help us to improve our tool; we have in mind, for example, OGSA and JINI. Major issue: distribution of the monitoring DB to get more scalability Authentication/authorization Remote system management Better analysis of monitoring date and their presentation Personal monitoring

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 28 / 29 Packaging/installation Server side Linux Red Hat 7.3 PostgreSQL PHP (GDLibv2) JpGraph 1.2 Nagios

DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 29 / 29 Packaging/installation (2) Server side: Guide to installation and configuration Client side: Packaged rpms to install on the Head Node and on the Worker Nodes (fmon server/agents) Replacing the Glue-CE.schema file on the Head Node (GLUE+)