Grid Service Monitoring Working Group

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
OSG Area Coordinators Meeting Operations Rob Quick 2/22/2012.
MyOSG: A user-centric information resource for OSG infrastructure data sources Arvind Gopu, Soichi Hayashi, Rob Quick Open Science Grid Operations Center.
OSG Area Coordinators Meeting Operations Rob Quick 2/22/2012.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
CERN IT Department CH-1211 Geneva 23 Switzerland t Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
HPDC Report Domenico Vicinanza CERN IT-GD-OPS CERN, July 12 th weekly OPS section meeting.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
Grid Operations Lessons Learned Rob Quick Open Science Grid Operations Center - Indiana University.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
LCG workshop on Operational Issues CERN November, EGEE CIC activities (SA1) Accounting: current status
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
RSV: OSG Grid Fabric Monitoring and Interoperation with WLCG Monitoring Systems Rob Quick, Arvind Gopu, and Soichi Hayashi Computing in High Energy and.
OSG Area Coordinators Meeting Operations Rob Quick 1/11/2012.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
Julia Andreeva on behalf of the MND section MND review.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team Kickoff Meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
Area Coordinator Report for Operations Rob Quick 4/10/2008.
WLCG Technical Evolution Group: Operations and Tools Maria Girone & Jeff Templon GDB 12 th October 2011, CERN.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
1 Models for Monitoring James Casey, CERN WLCG Service Reliability Workshop 27th November, 2007.
1 Grid Service Monitoring James Casey, CERN IT-GD WLCG/OSG Operations Meeting 14th June 2007.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
RSV: OSG Grid Monitoring and User Customizable Views Rob Quick, Arvind Gopu, and Soichi Hayashi High Performance Distributed Computing Location: Munich,
6 th CIC on Duty meeting Lyon 27-29/03/2006 Enabling Grids for E-sciencE Grid INTER-Operations Hélène Cordier EGEE/WLCG Operations IN2P3 Computing Centre.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
CERN IT Department CH-1211 Geneva 23 Switzerland t Michel Jouvin (GRIF/LAL) on behalf of James Casey (CERN) (All materials from J. Casey)
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Monitoring BOF, 23 rd Jan 2007 Grid Service Monitoring Working Group Monitoring WG BOF, January 2007 James Casey/Ian Neilson.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
May pre GDB WLCG services post EGEE Josva Kleist Michael Grønager Software Coord NDGF Director CERN, May 12 th 2009.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operational Tools M2 Update James Casey.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
NGI and Site Nagios Monitoring
Monitoring and Information Services Technical Group Report
Evolution of SAM in an enhanced model for monitoring the WLCG grid
March Availability Report for EGEE Sites based on Nagios
A Messaging Infrastructure for WLCG
Monitoring in EGEE Automatisierung & Regionalisierung im Hinblick auf EGI Torsten Antoni (KIT), James Casey (CERN), Sabine Reißer (KIT)
LCG Operations Workshop, e-IRG Workshop
Leigh Grundhoefer Indiana University
gLite The EGEE Middleware Distribution
User Accounting Integration Spreading the Net.
Presentation transcript:

Grid Service Monitoring Working Group Rob Quick Open Science Grid Operations Center - Indiana University HPDC 2007 Monterey, California - 25 June 2007

Grid Services Monitoring WG Mandate “….to help improve the reliability of the grid infrastructure….” “…. provide stakeholders with views of the infrastructure allowing them to understand the current and historical status of the service. …” “… stakeholder are site administrators, grid service managers and operations, VOs, Grid Project management” R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 Monitoring You can’t manage what you don’t measure... R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

Aims of grid services WG Not to provide yet another technical solution But, Improve reliability of WLCG Consolidate existing solutions Improve communication Reduce overlap Increase sharing R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 How? Engage with stakeholders Operations meetings WLCG Workshops Questionnaires to site managers Grid Service providers (EGEE, OSG) Grid Middleware providers (gLite, VDT) Monitoring software providers (SAM, VORS, GridIce, MonAmi, GridView, LEMON, Nagios, …) External experts (openlab EDS collaboration) Other Working Groups R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

Tasks of grid services WG Best practice notes How to many grid proxies for monitoring Message-level Security for monitoring What information can/should be passed through site boundary … Create set of ‘standard’ WLCG probes And how to calculate availability based on the metrics produced R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 Direction Focus on the interaction points between the different systems Allow for diversity across different grid infrastructures “Specifications, not Standards” Timescales mean we can’t get involved in long and heavyweight standards activities Take best practices from existing systems, and document them Implement simple prototypes And mature the bits that work ! Get something out to the stakeholders Close feedback loop is the key to adoption Plan for a “standards based” solution in the future R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 High-Level Model R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 Terminology Metric A data value gathered that tells us something about a service Probe The actual code which gathers the metric/metrics Check & Sensor A ‘probe’ in Nagios and LEMON respectively R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 Locality of Probes ‘local’ can mean two things ;( ‘local’ and ‘remote’ with respect to probing the interface of the service local means on the site remote means external to the site (host-)local probes Gathering information from the operating system level Traditional fabric management probes R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 Specifications Probe Specification Defines how a fabric monitoring system can interact with probes that test grid services Simple text-based protocol (lightweight) Decouples grid probes from the specifics of the fabric monitoring system Allows for currently existing probes to be re-used by any monitoring system SAM Tests EGEE CE ROC Nagios testing OSG Tests R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 Sample Probe Output R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

Example of exchange format <?xml version="1.0"?> <root xmlns="http://cern.ch/grid-mon/2007/05/mon-exchange-schema/"> <Region name="CERN"> <Site name="CERN-PROD"> <type>Production</type> <status>Certified</status> <SiteMetric name="site-daily-avail"> <measurement> <status>ok</status> <summary>0.3</summary> <timestamp>2007-02-25T00:00:00Z</timestamp> </measurement> </SiteMetric> <Service endpoint="https://ce101.cern.ch:2119/" type="CE"> <isMonitored>true</isMonitored> <inMaintenance>false</inMaintenance> … </Service> </Site> </Region> </root> R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 Site monitoring We can’t/won’t impose a solution on sites They might/should have something already Specification based approach allows our probes fit into any fabric monitoring system Data Exchange format allows higher-level services consume the data regardless of fabric monitoring system R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

Summary Effort invested to understand the current landscape Approach for improvement based on specifications of interfaces between components Prototype has been developed and tested on a small scale Now looking for early adopters to get feedback https://twiki.cern.ch/twiki/bin/view/LCG/GridServiceMonitoringInfo R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007

R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007 Thank You Special to the Thanks James Casey and the GOC Team: John Rosheck, Tim Silvers, Kyle Gross, and Arvind Gopu www.opensciencegrid.org www.grid.iu.edu R. Quick "WLCG-OSG-EGEE Interop" 26 Jan 2007