Jeremy Nowell EPCC, University of Edinburgh A Standards Based Alarms Service for Monitoring Federated Networks.

Slides:



Advertisements
Similar presentations
Nicolas Simar, Network Engineer 12/01/2005 Brussels DANTE GN2-JRA1 Performance Monitoring.
Advertisements

Connect. Communicate. Collaborate I-SHARe Anand Patil, DANTE NML-WG, Open Grid Forum 22, Cambridge (MA), 26 February 2008.
Key Multi-domain GÉANT Network Services June 2011.
Connect. Communicate. Collaborate Place your organisation logo in this area GN2 Multidomain Monitoring Service: Serving IP NOCs Nicolas Simar, DANTE APM.
CCNA2 Module 4. Discovering and Connecting to Neighbors Enable and disable CDP Use the show cdp neighbors command Determine which neighboring devices.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Connect. Communicate. Collaborate Towards Multi-domain Monitoring for the Research Networks Nicolas Simar, Dante TNC 2005, Poznan, June 2005.
Connect. Communicate. Collaborate WI5 – tools implementation Stephan Kraft October 2007, Sevilla.
Connect. Communicate. Collaborate GÉANT2 JRA1 & perfSONAR Loukik Kudarimoti, DANTE 28 th May, 2006 RNP Workshop, Curitiba.
Connect. Communicate. Collaborate Introduction to perfSONAR Loukik Kudarimoti, DANTE 27 th September, 2006 SEEREN2 Summer School, Heraklion.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Network Performance Measurement Atlas Tier 2 Meeting at BNL December Joe Metzger
1 ESnet Network Measurements ESCC Feb Joe Metzger
User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.
GEANT Performance Monitoring Infrastructure – Joint Techs meeting July Nicolas Simar GEANT’s Performance Monitoring.
GN2 Performance Monitoring & Management : AA Needs – Nicolas Simar - 2 nd AA Workshop Nov 2003 Malaga, Spain GN2 Performance Monitoring & Management.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE II - Network Service Level Agreement (SLA) Establishment EGEE’07 Mary Grammatikou.
INFSO-RI Enabling Grids for E-sciencE Federated Network Performance Monitoring for the Grid K. Kavoussanakis, EPCC, The University.
Performance Monitoring - Internet2 Member Meeting -- Nicolas Simar Performance Monitoring Internet2 Member Meeting, Indianapolis.
Connect communicate collaborate perfSONAR MDM updates: New interface, new possibilities Domenico Vicinanza perfSONAR MDM Product Manager
Connect. Communicate. Collaborate 1 ICISP, Cap Esterel (France), August 26-28, 2006 Complementary Visualization of perfSONAR Performance Measurements Andreas.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Connect communicate collaborate perfSONAR MDM updates: New interface, new weathermap, towards a complete interoperability Domenico Vicinanza perfSONAR.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Network Performance Monitoring for the EGEE.
Internet2 Performance Update Jeff W. Boote Senior Network Software Engineer Internet2.
1 Measuring Circuit Based Networks Joint Techs Feb Joe Metzger
An XML Schema for NMWG Yee-Ting Li, UCL. Metrics All results from Network Monitoring stored in some format All results from Network Monitoring stored.
Connect. Communicate. Collaborate Implementing Multi-Domain Monitoring Services for European Research Networks Szymon Trocha, PSNC A. Hanemann, L. Kudarimoti,
Connect. Communicate. Collaborate GÉANT2 and the GRID Domenico Vicinanza DANTE EGEE 08 Meeting, Istanbul September 2008.
Network Measurement Tools ESnet Site Coordinators Meeting 26 April 2000 Tracie Monk, UCSD/SDSC/CAIDA -
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Nicolas Simar – DANTE - Sequin: Monitoring Infrastructure Monitoring Premium IP.
Connect communicate collaborate Intercontinental Multi-Domain Monitoring for the LHC Community Domenico Vicinanza perfSONAR MDM Product Manager DANTE –
Connect. Communicate. Collaborate The authN and authR infrastructure of perfSONAR MDM Ann Arbor, MI, September 2008.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Etienne Dublé - CNRS/UREC EGEE SA2 Xavier.
Connect. Communicate. Collaborate Using PerfSONAR tools in a production environment Marian Garcia, Operations Manager, DANTE Joint Tech Workshop, 16 th.
Internet2 End-to-End Performance Initiative Eric L. Boyd Director of Performance Architecture and Technologies Internet2.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update on Network Performance Monitoring.
Connect communicate collaborate perfSONAR MDM for LHCOPN/LHCONE: partnership, collaboration, interoperability, openness Domenico Vicinanza perfSONAR MDM.
Interoperable Measurement Frameworks: Joint Monitoring of GEANT & Abilene Eric L. Boyd, Internet2 Nicolas Simar, DANTE.
Connect communicate collaborate LHCONE Diagnostic & Monitoring Infrastructure Richard Hughes-Jones DANTE Delivery of Advanced Network Technology to Europe.
EGEE-II INFSO-RI Enabling Grids for E-sciencE End-to-End Service Level Agreement Provisioning and Monitoring for End-to-End QoS.
PerfSONAR-PS Working Group Aaron Brown/Jason Zurawski January 21, 2008 TIP 2008 – Honolulu, HI.
Connect communicate collaborate Connectivity Services, Autobahn and New Services Domenico Vicinanza, DANTE EGEE’09, Barcelona, 21 st -25 th September 2009.
DICE: Authorizing Dynamic Networks for VOs Jeff W. Boote Senior Network Software Engineer, Internet2 Cándido Rodríguez Montes RedIRIS TNC2009 Malaga, Spain.
Connect. Communicate. Collaborate mcview – A tool for visualising and debugging multicast Stig Venaas, UNINETT TNC 2008, Bruges, May 21 st.
Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka.
July 19, 2004Joint Techs – Columbus, OH Network Performance Advisor Tanya M. Brethour NLANR/DAST.
Networking and the Grid Ahmed Abdelrahim NeSC NeSC PPARC e-Science Summer School 10 th May 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What Can Network Performance Monitoring Do.
1 LHCOPN Monitoring Directions January 2007 Joe Metzger
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operational model: Roles and functions.
Connect communicate collaborate perfSONAR MDM News Domenico Vicinanza DANTE (UK)
INFSO-RI Enabling Grids for E-sciencE Network Services Development Network Resource Provision 3 rd EGEE Conference, Athens, 20 th.
Connect communicate collaborate perfSONAR MDM updates Domenico Vicinanza DANTE, Cambridge, UK perfSONAR MDM Product Manager
Connect communicate collaborate perfSONAR MDM updates for EGI Domenico Vicinanza perfSONAR MDM Product Manager EGI User Forum,
1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
Networking for the Future of Science
Robert Szuman – Poznań Supercomputing and Networking Center, Poland
PerfSONAR: Development Status
Monitoring Appliance Status
Network Monitoring and Troubleshooting with perfSONAR MDM
Internet2 Performance Update
ESnet Network Measurements ESCC Feb Joe Metzger
Network Performance Measurement
“Detective”: Integrating NDT and E2E piPEs
Presentation transcript:

Jeremy Nowell EPCC, University of Edinburgh A Standards Based Alarms Service for Monitoring Federated Networks Kostas Kavoussanakis, Jeremy Nowell, Charaka Palansuriya, Florian Scharinger, Arthur Trew ICNS 2009 Valencia 24 April 2009

Jeremy Nowell - A Standards Based Alarms Service2 Project Background EPCC is supercomputing centre at University of Edinburgh –Host UK national academic HPC service –Academic and industrial consultancy – EPCC has been working in area of network monitoring for Grids for 5 years –First within EGEE project, now more widely

24 April 2009Jeremy Nowell - A Standards Based Alarms Service33 Overview Challenges of monitoring federated networks Standards-based network monitoring Why an Alarms Service Architecture Examples Future Work

24 April 2009Jeremy Nowell - A Standards Based Alarms Service4 Federated Networks

24 April 2009Jeremy Nowell - A Standards Based Alarms Service5 Network Monitoring Challenges Network Monitoring TypesTools User Groups Data Formats Administrative Domains NOC backboneiperfping netflow RRD SQL Flat file GOC End user project NREN MAN end-to-end perfSONAR

24 April 2009Jeremy Nowell - A Standards Based Alarms Service6 Federated Network Monitoring Challenges Scale and heterogeneity poses requirement to support diversity of all kinds –Multitude of ways to collect monitoring data Different measurement types –End-to-end Appropriate to experience of user and application, eg TCP achievable bandwidth –Backbone Lower level measurements, used to pin-point source of problems Different measurement tools Different data formats –Many administrative domains –Different user groups

24 April 2009Jeremy Nowell - A Standards Based Alarms Service7 Federated Networks for Grids For Grids need –unified view –end-to-end performance real achievable application performance

24 April 2009Jeremy Nowell - A Standards Based Alarms Service8 Federated Network Monitoring Strategy Use existing tools and data –Do not try and force adoption of single tool across large multi- administrative domains –Instead provide framework for accessing distributed data Use standards-based solutions where possible –Access wide range of data –Allow interoperability between grids, projects and networks

24 April 2009Jeremy Nowell - A Standards Based Alarms Service9 Standards-Based Network Monitoring Data federation through use of schema provided by Open Grid Forum (OGF) Network Measurements Working Group (NM-WG) NM-WG Schema allows interoperability between clients and measurement frameworks

24 April 2009Jeremy Nowell - A Standards Based Alarms Service10 Standards Based Network Monitoring EPCC has developed tools for accessing historical network performance data from multiple measurement frameworks e2emonit –End-to-end metrics (TCP/UDP achievable bandwidth, RTT, packet loss, OWDV) –Active measurement tools (iperf, ping, udpmon) perfSONAR –Developed by collaboration including GÉANT2, ESnet, Internet2 –Passive data for router interfaces Utilisation, input errors, output drops –Traceroute information

24 April 2009Jeremy Nowell - A Standards Based Alarms Service11 But… Historical data only useful for diagnosing problems when you already know something is wrong What users really needed are… ALARMS

24 April 2009Jeremy Nowell - A Standards Based Alarms Service12 Requirements A network Alarms Service –Allows the timely detection of problems –Notifies users –Gives an “at a glance” view of network status

24 April 2009Jeremy Nowell - A Standards Based Alarms Service13 –perfSONAR based monitoring solution deployed and operated by DANTE Need following alarms as minimum –Unexpected path changes –Routing out of private network –Router Interface Congestion Packets lost Specific Requirements Motivated by the LHCOPN –10 Gb/s private network for moving data generated by the LHC

24 April 2009Jeremy Nowell - A Standards Based Alarms Service14 Strategy Query Detect Notify

24 April 2009Jeremy Nowell - A Standards Based Alarms Service15 Architecture

24 April 2009Jeremy Nowell - A Standards Based Alarms Service16 Details Query –NM-WG standard queries to perfSONAR RRD and HADES Measurement Archives Passive Router Data – interface errors, drops, utilisation Traceroute Information Detect –Rules based mechanism to process data against rules defined in configuration files DROOLS library Notify –Output status in form usable by Nagios Status display, notifications, history –Easily implement more status notifiers

24 April 2009Jeremy Nowell - A Standards Based Alarms Service17 Examples

24 April 2009Jeremy Nowell - A Standards Based Alarms Service18 Examples

24 April 2009Jeremy Nowell - A Standards Based Alarms Service19 Examples

24 April 2009Jeremy Nowell - A Standards Based Alarms Service20 Current Status Prototype is currently being used by DANTE to monitor some LHCOPN paths and interfaces, for the required alarm conditions –Test functionality –Gather feedback from users Will be further developed and deployed to monitor whole of LHCOPN during this year Actively looking for other users

24 April 2009Jeremy Nowell - A Standards Based Alarms Service21 Further Work Implement more alarm conditions Send status information to other consumers, eg network weather map Think about data processing –eg “cleaning” of data to remove bad data points –Statistical processing etc

24 April 2009Jeremy Nowell - A Standards Based Alarms Service22 Summary Monitoring of federated networks is a challenge An Alarms Service is critical for problem discovery The LHCOPN is being monitored using an initial version –and will be developed further to be deployed to monitor the whole network

24 April 2009Jeremy Nowell - A Standards Based Alarms Service23 Acknowledgements –Funding UK Joint Information Systems Committee (JISC) EGEEII (INFSO-RI ) DEISA2 (RI ) –Collaboration DANTE DFN WiN-Labor Erlangen LHC-OPN