Download presentation
Presentation is loading. Please wait.
Published byClare Houston Modified over 9 years ago
1
Jeremy Nowell EPCC, University of Edinburgh jeremy@epcc.ed.ac.uk http://www.npm-alarms.org/ A Standards Based Alarms Service for Monitoring Federated Networks Kostas Kavoussanakis, Jeremy Nowell, Charaka Palansuriya, Florian Scharinger, Arthur Trew ICNS 2009 Valencia 24 April 2009
2
Jeremy Nowell - A Standards Based Alarms Service2 Project Background EPCC is supercomputing centre at University of Edinburgh –Host UK national academic HPC service –Academic and industrial consultancy –http://www.epcc.ed.ac.uk/http://www.epcc.ed.ac.uk/ EPCC has been working in area of network monitoring for Grids for 5 years –First within EGEE project, now more widely
3
24 April 2009Jeremy Nowell - A Standards Based Alarms Service33 Overview Challenges of monitoring federated networks Standards-based network monitoring Why an Alarms Service Architecture Examples Future Work
4
24 April 2009Jeremy Nowell - A Standards Based Alarms Service4 Federated Networks
5
24 April 2009Jeremy Nowell - A Standards Based Alarms Service5 Network Monitoring Challenges Network Monitoring TypesTools User Groups Data Formats Administrative Domains NOC backboneiperfping netflow RRD SQL Flat file GOC End user project NREN MAN end-to-end perfSONAR
6
24 April 2009Jeremy Nowell - A Standards Based Alarms Service6 Federated Network Monitoring Challenges Scale and heterogeneity poses requirement to support diversity of all kinds –Multitude of ways to collect monitoring data Different measurement types –End-to-end Appropriate to experience of user and application, eg TCP achievable bandwidth –Backbone Lower level measurements, used to pin-point source of problems Different measurement tools Different data formats –Many administrative domains –Different user groups
7
24 April 2009Jeremy Nowell - A Standards Based Alarms Service7 Federated Networks for Grids For Grids need –unified view –end-to-end performance real achievable application performance
8
24 April 2009Jeremy Nowell - A Standards Based Alarms Service8 Federated Network Monitoring Strategy Use existing tools and data –Do not try and force adoption of single tool across large multi- administrative domains –Instead provide framework for accessing distributed data Use standards-based solutions where possible –Access wide range of data –Allow interoperability between grids, projects and networks
9
24 April 2009Jeremy Nowell - A Standards Based Alarms Service9 Standards-Based Network Monitoring Data federation through use of schema provided by Open Grid Forum (OGF) Network Measurements Working Group (NM-WG) NM-WG Schema allows interoperability between clients and measurement frameworks
10
24 April 2009Jeremy Nowell - A Standards Based Alarms Service10 Standards Based Network Monitoring EPCC has developed tools for accessing historical network performance data from multiple measurement frameworks e2emonit –End-to-end metrics (TCP/UDP achievable bandwidth, RTT, packet loss, OWDV) –Active measurement tools (iperf, ping, udpmon) perfSONAR –Developed by collaboration including GÉANT2, ESnet, Internet2 –Passive data for router interfaces Utilisation, input errors, output drops –Traceroute information
11
24 April 2009Jeremy Nowell - A Standards Based Alarms Service11 But… Historical data only useful for diagnosing problems when you already know something is wrong What users really needed are… ALARMS
12
24 April 2009Jeremy Nowell - A Standards Based Alarms Service12 Requirements A network Alarms Service –Allows the timely detection of problems –Notifies users –Gives an “at a glance” view of network status
13
24 April 2009Jeremy Nowell - A Standards Based Alarms Service13 –perfSONAR based monitoring solution deployed and operated by DANTE Need following alarms as minimum –Unexpected path changes –Routing out of private network –Router Interface Congestion Packets lost Specific Requirements Motivated by the LHCOPN –10 Gb/s private network for moving data generated by the LHC
14
24 April 2009Jeremy Nowell - A Standards Based Alarms Service14 Strategy Query Detect Notify
15
24 April 2009Jeremy Nowell - A Standards Based Alarms Service15 Architecture
16
24 April 2009Jeremy Nowell - A Standards Based Alarms Service16 Details Query –NM-WG standard queries to perfSONAR RRD and HADES Measurement Archives Passive Router Data – interface errors, drops, utilisation Traceroute Information Detect –Rules based mechanism to process data against rules defined in configuration files DROOLS library Notify –Output status in form usable by Nagios Status display, notifications, history –Easily implement more status notifiers
17
24 April 2009Jeremy Nowell - A Standards Based Alarms Service17 Examples
18
24 April 2009Jeremy Nowell - A Standards Based Alarms Service18 Examples
19
24 April 2009Jeremy Nowell - A Standards Based Alarms Service19 Examples
20
24 April 2009Jeremy Nowell - A Standards Based Alarms Service20 Current Status Prototype is currently being used by DANTE to monitor some LHCOPN paths and interfaces, for the required alarm conditions –Test functionality –Gather feedback from users Will be further developed and deployed to monitor whole of LHCOPN during this year Actively looking for other users
21
24 April 2009Jeremy Nowell - A Standards Based Alarms Service21 Further Work Implement more alarm conditions Send status information to other consumers, eg network weather map Think about data processing –eg “cleaning” of data to remove bad data points –Statistical processing etc
22
24 April 2009Jeremy Nowell - A Standards Based Alarms Service22 Summary Monitoring of federated networks is a challenge An Alarms Service is critical for problem discovery The LHCOPN is being monitored using an initial version –and will be developed further to be deployed to monitor the whole network
23
24 April 2009Jeremy Nowell - A Standards Based Alarms Service23 Acknowledgements –Funding UK Joint Information Systems Committee (JISC) EGEEII (INFSO-RI-031688) DEISA2 (RI-222919) –Collaboration DANTE DFN WiN-Labor Erlangen LHC-OPN
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.