Introduction to OAT presentations James Casey SA1 Management Meeting Abingdon, 3rd December 2008
OAT update since EGEE’08 4 Phone Conferences F2F meetings: Minutes : Here F2F meetings: With Gridview team (availability calculation) With GGUS team SAM/regional dashboard teams GOCDB/SAM Mailing list – egee3-operations-automation-discuss Sent more discussion mails to –discuss list and roc-managers list Very little response outside of OAT members To change: View -> Header and Footer
OAT update since EGEE’08 Documents produced New web areas: List of known tasks within OAT scope Spreadsheet – 0810-Work items-v1.2.xls Breakdown of components in multi-level monitoring Image – 0810-Work Items deployment-v1.4.png Response to MSA1.3 - Quality metrics for quarterly reports Spreadsheet – 0810-MSA1.3-Response-v1.0.xls Document – 0810-MSA1.3-Response-v1.0.doc New web areas: Documents/ FAQs: Sharepoint: https://espace.cern.ch/sa1-share/oat/default.aspx General information and tutorials/guides: Twiki: https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III To change: View -> Header and Footer
Architecture of the regional solution Use Nagios to probe sites from ROC Have a self-contained set of components inside the region for: Storing topology of regional grid From GOCDB and BDII Storing metrics results from probes Raising alarms Raising tickets Viewing metric history and details for debugging Central data stores and components for project-level systems Project level metric store Availability calculation GOCDB, Information system monitoring (Gstat) To change: View -> Header and Footer
Architecture of the regional solution Use Messaging to pass information to/from: Site components Regional components Project-level components Use Nagios at site for improving site reliability Many more probes deployed Including on service nodes and worker nodes directly Used by site manager to respond more quickly to problems To change: View -> Header and Footer
Staged approach 8 months in to project Staged approach 16 months left including deployment Staged approach Can stop at any point And leave the rest of the work to EGI OAT Strategy document defines the endpoint Fully regionalized and interoperational Now we lay out the plan for the various components Regional dashboard Multi-level monitoring Metric store, alarms and availability calculation GOCDB Gstat To change: View -> Header and Footer
Components in Multi-level monitoring To change: View -> Header and Footer
Areas of work Focusing on regional monitoring solution Areas of work being driven by the allocation of effort committed in the WBS for monitoring Other things will be included as we get effort Full list of tasks including development, deployment, testing and support tasks in the Work items spreadsheet Priority areas for future work Monitoring of the monitoring system System management for messaging Core services monitoring Quarterly report metrics portal (MSA1.3) To change this priority, contribute some effort ? To change: View -> Header and Footer