GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Grid Monitoring Services Robin Middleton RAL/PPD24-May-01
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Overview What is Monitoring ? GGF Perf-WG DataGrid WP3 Example : Netlogger Summary
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Introduction Information Services part dealt with separately today DataGrid WorkPackage 3 (WP3) UK leadership / responsibility WP3 = Grid Monitoring AND Information Services Global Grid Forum - Perf Mon Workgroup
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May What is Monitoring ? Application performance Fabric availability Network availability / performance Event / Alert Archives Forecasting (e.g NWS) Issues update/read frequency information streaming hierarchical.vs. relational relaxed coherence; timestamps scalable; non-invasive non-repeatable Monitoring.vs. Monitoring & Information ?
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Boundaries Mass Storage Computing Fabric Network Monitoring Application Workload Mgt DataMan End-Users Sys/Grid-Admin
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May GGF : Perf-WG “The Grid Performance working group is focused on defining standards and best practices for the gathering, representation, storage, distribution, and query of performance information about Grid resources and applications.” Four Projects (!) 1.Define a schema for data formats for performance monitoring. This would be a common interchange format that tools could use to interoperate. 2.Taxonomy / classification of performance monitoring and analysis tools. 3.Survey of existing tools classified by the above taxonomy. 4.Recommendations on the aspects of grid applications, services and resources that should be monitored. 5.The development of performance monitoring tools based upon the survey of tools.
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May GGF Perf-WG : Use Cases 1: Instrumented library for performance measurement (e.g. I/O system) 2: Netlogger/DPSS monitoring streams to log file 3: JAMM (Java) sensors stream data to a GUI 4: JAMM/Port Monitor 5: Fault detection & analysis 6: Job progress monitoring 7: Distributed system performance analysis 8: Network-aware, self-tuning applications 9: Data replication (choice of “best” location) 10: Scheduling & prediction services 11: Auditing systems 12: Configuration monitoring 13: User application monitoring 14: Application self-tuning 15: Real-time adaptive simulation & presentation
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May DataGrid : WorkPackage 3 The aim of this workpackage is to specify, develop, integrate and test tools and infrastructure to enable end-user and administrator access to status and error information in a Grid environment and to provide an environment in which application monitoring can be carried out. This will permit both job performance optimisation as well as allowing for problem tracing and is crucial to facilitating high performance Grid computing.
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Architecture (GGF : Perf-WG) Architecture (GGF : Perf-WG) Producer Sensor Host - A Sensor Host - B Consumer Directory Service Producer Publish Subscribe Discovery
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May WP3 : Tasks Umbrellas Task 3.1: Requirements & Design (month 1-12) Task 3.2: Current Technology (month 1-12) Task 3.3: Infrastructure (month 7-24) Task 3.4: Analysis & Presentation (month 7-24) Task 3.5: Test & Refinement (month 19-36)
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May WP3 : Deliverables (as in the TA) D3.1 (Report) Month 12: Evaluation Report of current technology D3.2 (Report) Month 9 : Detailed architectural design report and evaluation criteria (also input to WP12 architecture deliverable) D3.3 (Prototype) Month 9: Components and documentation for the First Project Release (see WP 6) D3.4 (Prototype) Month 21: Components and documentation for the Second Project Release (see WP 6) D3.5 (Prototype) Month 33: Components and documentation for the Final Project Release (see WP 6) D3.6 (Report) Month 36: Final evaluation report
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May WP3 : Milestones (as in the TA) M3.1 Month 6: Decide baseline architecture & technologies. M3.2 Month 9: Provide requirements for collation by Project Architect M3.3 Month 9: Prototype components integrated into First Project release (see WP 6) M3.4 Month 21: Interim components integrated into Second Project Release (see WP 6) M3.5 Month 33: Final components integrated into Final Project Release (see WP 6)
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May WP3 : First Release (PM9) Information services based on a new version of the Globus MDS (soon to be in alpha release).Information services based on a new version of the Globus MDS (soon to be in alpha release). Rudimentary implementation of a relational approach to information services.Rudimentary implementation of a relational approach to information services. A set of APIs in support of both MDS and GMA approaches.A set of APIs in support of both MDS and GMA approaches. Basic presentation of performance monitoring data based around Netlogger
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May WP3 : Effort FundedUnfundedTotal PPARC SZTAKI (HU) INFN (IT) IBM-UK Total Trinity College Dublin (NB : for both Monitoring and Information Services )
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May WP3 : Use Cases WP3 : Use Cases Fault Detection & Analysis, Heartbeats [5] Job Status & Progress Monitoring [6] Application Performance Monitoring [1,13] Performance Analysis of Distributed Systems [7] Scheduling Services and Self Tuning Applications [8,10,14,(] Scheduling Services and Self Tuning Applications [8,10,14,(15)] Data Replication Services [9] Accounting & Auditing [11] Configuration monitoring [12]
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May WP3 : Decisions (end 2000) Try to track standards & best practice from Global Grid Forum evaluate, steer, adopt, … Other WPs should provide the majority of sensors network, fabric, mass-storage WP3 will provide the instrumentation API Key deliverables will be Performance Services Error / Alert Services Status / Parameter Services Logging / Archival Services (forecasting) - information to enable other WPs to do this WP3 subcontracts archival services (in terms of the data management aspects) ?
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Netlogger Supervisor Processing Node Readout Buffer Acknowledgement : Weidong Li
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Netlogger Supervisor Processing Node Readout Buffer Acknowledgement : Weidong Li
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Sequence Diagram Supervisor Readout Buffer Processing Node Request Fetch Data Return data Result TIMETIME
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Results X : secs Y : “count” Acknowledgment : Weidong Li
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Netlogger Summary Example deployment Time resolution NTP (~5ms) Custom h/w (~50 s) Thread safety ? Variety of visualisation methods “non-invasive” ? Moving towards the GMA e.g. integration of directory service
GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Summary Information Service is KEY to Monitoring …and nature of service to be determined ! Unified Information Architecture is important …otherwise duplication and inconsistencies Align with Global Grid Forum for “standards”, etc. Starting point is Netlogger DataGrid deliverable details are testbed “driven” Cross-DataGrid WP - service to many areas