RSV and Nagios in OSG Rob Quick. March 11, 2008 USCMS Tier-2 Workshop 2 Current State of OSG ~ 100 Sites ~ 30 VOs April 8th:  216,000 jobs (85% successful)

Slides:



Advertisements
Similar presentations
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
Advertisements

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
MyOSG: A user-centric information resource for OSG infrastructure data sources Arvind Gopu, Soichi Hayashi, Rob Quick Open Science Grid Operations Center.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
OSG Security Kevin Hill. Goals Operational Security – Identify software vulnerabilities – observing the practices of our VOs and sites, and sending alerts.
G RID M IDDLEWARE AND S ECURITY Suchandra Thapa Computation Institute University of Chicago.
Apr 30, 20081/11 VO Services Project – Stakeholders’ Meeting Gabriele Garzoglio VO Services Project Stakeholders’ Meeting Apr 30, 2008 Gabriele Garzoglio.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Nagios Demonstration Tom Wlodek SLAC Tier2 workshop
Overview of Monitoring and Information Systems in OSG MWGS08 - September 18, Chicago Marco Mambelli - University of Chicago
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
OSG Software and Operations Plans Rob Quick OSG Operations Coordinator Alain Roy OSG Software Coordinator.
Evolution of the Open Science Grid Authentication Model Kevin Hill Fermilab OSG Security Team.
OSG Security Review Mine Altunay December 4, 2008.
Rob Quick OSG Operations Area Coordinator Manager High Throughput Computing Indiana University Integrating OSG Operational Services Rob Quick OSG Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Towards a Global Service Registry for the World-Wide LHC Computing Grid Maria ALANDES, Laurence FIELD, Alessandro DI GIROLAMO CERN IT Department CHEP 2013.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
State of the OSG Software Stack Alain Roy OSG Software Coordinator.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
Emergency Suspension list Vincent BRILLAULT HEPiX Spring 2014, Annecy.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
Top 10 Reasons to Upgrade to OSG Version Rob Quick OSG Operations Coordinator.
The OSG and Grid Operations Center Rob Quick Open Science Grid Operations Center - Indiana University ATLAS Tier 2-Tier 3 Meeting Bloomington, Indiana.
Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
RSV: OSG Grid Fabric Monitoring and Interoperation with WLCG Monitoring Systems Rob Quick, Arvind Gopu, and Soichi Hayashi Computing in High Energy and.
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
Area Coordinator Report for Operations Rob Quick 4/10/2008.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
OSG Operations All Hands Meeting Rob Quick (Ops Coordinator) Slides by: Scott Teige and Kyle Gross.
Operations Area Coordinator Report. 31 Jan Overview Operations Current Initiatives  RSV Version 2  New Probes, Easier Configuration, Improved.
User Support of WLCG Storage Issues Rob Quick OSG Operations Coordinator WLCG Collaboration Meeting Imperial College, London July 7,
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Opensciencegrid.org Operations Interfaces and Interactions Rob Quick, Indiana University July 21, 2005.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
OSG PKI Transition Impact on CMS. Impact on End User After March , DOEGrids CA will stop issuing or renewing certificates. If a user is entitled.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
1 Grid Service Monitoring James Casey, CERN IT-GD WLCG/OSG Operations Meeting 14th June 2007.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
RSV: OSG Grid Monitoring and User Customizable Views Rob Quick, Arvind Gopu, and Soichi Hayashi High Performance Distributed Computing Location: Munich,
John Gordon EMI TF and EGI CF March 2012 Accounting Workshop.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
Open Science Grid and GLUE 2.0 Rob Quick OSG Operations Area Coordinator Manager High Throughput Computing Indiana University.
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
Operations Interfaces and Interactions
GOCDB New Requirements
POW MND section.
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Grid Service Monitoring Working Group
March Availability Report for EGEE Sites based on Nagios
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Publishing ALICE data & CVMFS infrastructure monitoring
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
Presentation transcript:

RSV and Nagios in OSG Rob Quick

March 11, 2008 USCMS Tier-2 Workshop 2 Current State of OSG ~ 100 Sites ~ 30 VOs April 8th:  216,000 jobs (85% successful)  375,000 wallclock hours  About half of the jobs were run on resources NOT owned by the VO that owns the resources

March 11, 2008 USCMS Tier-2 Workshop 3 Recent and Upcoming Operations Highlights WLCG SAM reporting of availability Statistics  SAM Interface  GridView Interface OIM Registration Database RSV Version 2  Easier to configure and upkeep  SE Probes

March 11, 2008 USCMS Tier-2 Workshop 4 SAM Environment

March 11, 2008 USCMS Tier-2 Workshop 5 GridView

March 11, 2008 USCMS Tier-2 Workshop 6 RSV Version 2 New probes  SE  GUMS  Software versions  CA Certificates up to date New simplified configuration scheme Service Certificates! VO access to RSV Database info and web interface Hook to OIM

March 11, 2008 USCMS Tier-2 Workshop 7 A Probe probes]$./jobmanagers-status-probe -u proton.fis.cinvestav.mx -m all metricName: org.osg.batch.jobmanager-fork-status metricType: status timestamp: T11:57:41Z metricStatus: OK serviceType: globus-GRAM-fork serviceURI: proton.fis.cinvestav.mx gatheredAt: feynman.uits.iupui.edu summaryData: OK detailsData: A test job was successfully submitted to "proton.fis.cinvestav.mx/jobmanager-fork", its status when last checked was a valid one ("ACTIVE"); and finally the test job was successfully cleaned up! EOT metricName: org.osg.batch.jobmanager-pbs-status metricType: status timestamp: T11:57:41Z metricStatus: OK serviceType: globus-GRAM-PBS serviceURI: proton.fis.cinvestav.mx gatheredAt: feynman.uits.iupui.edu summaryData: OK detailsData: A test job was successfully submitted to "proton.fis.cinvestav.mx/jobmanager-pbs", its status when last checked was a valid one ("DONE"); and finally the test job was successfully cleaned up! EOT

March 11, 2008 USCMS Tier-2 Workshop 8 Local RSV Structure

March 11, 2008 USCMS Tier-2 Workshop 9 RSV Reporting to Nagios Console

March 11, 2008 USCMS Tier-2 Workshop 10 Provided by: Sarah Williams

March 11, 2008 USCMS Tier-2 Workshop 11 History of Monitoring in OSG “Monitoring is always a difficult beast to tame. Much careful thought has gone into it over the years, and the highway to this point is littered with lots of dead monitoring bodies. I think the current effort is striving for simplicity, and I hope it gets there!” -Alan Sill (TACC)

March 11, 2008 USCMS Tier-2 Workshop 12 Planned Central Structure Can it be this simple?

March 11, 2008 USCMS Tier-2 Workshop 13 Questions/Comments?