Measurement on the Internet2 Network: an evolving story Matt Zekauskas Joint Techs, Minneapolis 11-Feb-2007
2 Outline The Internet2 Observatory What we are measuring today The perfSONAR vision What is happening in the near term LHC OPN “e2emon”
3 The Observatory Collect data for operations Understanding the network, and how well it is operating How we started Collect data for research Part of Internet2’s long-standing commitment to network research
4 The Observatory Two components Data collected by NOC and Internet2 itself Ability for researchers to collocate equipment when necessary
5 The New Internet2 Network Expanded Layer 1, 2 and 3 Facilities Includes SONET and Wave equipment Includes Ethernet Services Greater IP Services Requires expanded Observatory
6 In Brief Extends to all optical Add/Drop Sites Add capability: Run the control software Other out-of-band mgmt. tasks Refresh of Observatory Refresh PCs 10G capabilities on IPO 10G capability on Ciena Network (planned, next year) Experimental NetFPGA Cards (planned, next year) Standing up each node as it is installed
7 The New Internet2 Observatory Seek Input from the Community, both Engineers and Network Researchers Current thinking is to support three types of services Measurement (as before) Collocation (as before) Experimental Servers to support specific projects - for example, Phoebus (this is new) Support different types of nodes: Optical Nodes Router Nodes
8 Existing Observatory Capabilities One way latency, jitter, loss IPv4 and IPv6 (“owamp”) Regular TCP/UDP throughput tests – ~1 Gbps IPv4 and IPv6; On-demand available (“bwctl”) SNMP Octets, packets, errors; collected 1/min Flow data Addresses anonymized by 0-ing the low order 11 bits Routing updates Both IGP and BGP - Measurement device participates in both Router configuration Visible Backbone – Collect 1/hr from all routers Dynamic updates Syslog; also alarm generation (~nagios); polling via router proxy
9 Observatory Functions DeviceFunctionDetails nms-rthr1MeasurementBWCTL on-demand 1 Gpbs router throughput, Thrulay nms-rthr2MeasurementBWCTL on-demand 10 Gbps router throughput, Thrulay nms-rexpExperimentalNDT/NPAD nms-rpsvMeasurementNetflow collector nms-rlatMeasurementOWAMP with locally attached GPS timing nms-rphoExperimentalPhoebus 2 x 10GE to Multiservice Switch nms-octrManagementControls Multiservice Switch nms-oexpExperimentalNetFPGA nms-othrMeasurementOn-demand Multiservice Switch 10 Gbps throughput
10 Router Nodes
11 Router Nodes
12 Optical Nodes
13 Optical Nodes
14 Observatory Hardware Dell 1950 and Dell 2950 servers Dual Core 3.0 GHz Xeon processors 2 GB memory Dual RAID 146 GB disk Integrated 1 GE copper interfaces 10 GE interfaces Hewlett-Packard 10GE switches 9 servers at router sites, 3 planned at optical only sites (initially 1 - control)
15 Observatory Databases – Datа Types Data is collected locally and stored in distributed databases Databases Usage Data Netflow Data Routing Data Latency Data Throughput Data Router Data Syslog Data
16 Lots of Work to be Done Internet2 Observatory realization inside racks set for initial deployment, including planning for research projects (NetFPGA, Phoebus) Software and links easily changed Could add or change hardware depending on costs Researcher tools, new datasets Consensus on passive data
17 New Challenges Operations and Characterization of new services Finding problems with stitched together VLANs Collecting and exporting data from Dynamic Circuit Service... Ciena performance counters Control plane setup information Circuit usage (not utilization, although that is also nice) Similar for underlying Infinera equipment And consider inter-domain issues
18 Observatory Requirements Strawman Small group: Dan Magorian, Joe Metzger and Internet2 See document off of Want to start working group under new Network Technical Advisory Committee Interested? Talk to Matt or watch NTAC Wiki on wiki.internet2.edu; measurement page will also have some information…
19 Strawman: Potential New Focus Areas Technology Issues Is it working? How well? How debug problems? Economy Issues – interdomain circuits How are they used? Are they used effectively? Monitor violation of any rules (e.g. for short-term circuits) Compare with “vanilla” IP services?
20 Strawman: Potential High-Level Goals Extend research datasets to new equipment Circuit “weathermap”; optical proxy Auditing Circuits Who requested (at suitable granularity) What for? (ex: bulk data, streaming media, experiment control) Why? (add’l bw, required characteristics, application isolation, security)
21 Inter-Domain Issues Important New services (various circuits) New control plane That must work across domains Will require some agreement among various providers Want to allow for diversity…
22 Sharing Observatory Data We want to make Internet2 Network Observatory Data: Available: Access to existing active and passive measurement data Ability to run new active measurement tests Interoperable: Common schema and semantics, shared across other networks Single format XML-based discovery of what’s available
23 What is perfSONAR? Performance Middleware perfSONAR is an international consortium in which Internet2 is a founder and leading participant perfSONAR is a set of protocol standards for interoperability between measurement and monitoring systems perfSONAR is a set of open source web services that can be mixed-and-matched and extended to create a performance monitoring framework
24 perfSONAR Design Goals Standards-based Modular Decentralized Locally controlled Open Source Extensible Applicable to multiple generations of network monitoring systems Grows “beyond our control” Customized for individual science disciplines
25 perfSONAR Integrates Network measurement tools Network measurement archives Discovery Authentication and authorization Data manipulation Resource protection Topology
26 perfSONAR Credits perfSONAR is a joint effort: ESnet GÉANT2 JRA1 Internet2 RNP ESnet includes: ESnet/LBL staff Fermilab Internet2 includes: University of Delaware Georgia Tech SLAC Internet2 staff GÉANT2 JRA1 includes: Arnes Belnet Carnet Cesnet CYNet DANTE DFN FCCN GRNet GARR ISTF PSNC Nordunet (Uninett) Renater RedIRIS Surfnet SWITCH
27 perfSONAR Adoption R&E Networks Internet2 ESnet GÉANT2 European NRENs RNP Application Communities LHC GLORIAD Distributed Virtual NOC Roll-out to other application communities in 2007 Distributed Development Individual projects (10 before first release) write components that integrate into the overall framework Individual communities (5 before first release) write their own analysis and visualization software
28 Proposed Data to be made available via perfSONAR First Priorities Link status (CIENA data) SNMP data OWAMP BWCTL Second Priorities Additional CIENA data Ethernet stats SONET (Severely errored seconds, etc.) Light levels Similar Infinera data Later: Flow data Feedback? Alternate priorities?
29 What will (eventually) consume data? We intend to create a series of web pages that will display the data Third-party Analysis/Visualization Tools European and Brazilian UIs SLAC-built analysis software LHC OPN E2EMON More … Real applications Network-aware applications Consume performance data React to network conditions Request dynamic provisioning Future Example: Phoebus
JRA4 E2EMon slides From Mauro Campanella, GARR, 2006-Nov Demo:
Connect. Communicate. Collaborate 31 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 Problem space Point A PointB Domain A Domain B Domain C Goal: (near) real-time monitoring (link status) of constituent DomainLinks (and links between domains) and whole end-to-end Link A-B. The following applies to the GÉANT2+ service and the cross border fibres. E2ELink A-B
Connect. Communicate. Collaborate 32 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 Divide & conquer (JRA4 E2Emon info model) Connect. Communicate. Collaborate JRA4 view of world: note WDM systems, & static lambdas
Connect. Communicate. Collaborate 33 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 Approach Point A PointB Domain A Domain B Domain C E2ELink A-B perfSONAR MP or MA perfSONAR MP or MA E2Emon correlator perfSONAR Measurement Point (MP) or Measurement Archive (MA) DomainLink and (partial) ID_Link info “Weathermap” view for users E2ECU operators
Connect. Communicate. Collaborate 34 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 GARR SWITCH CNAF X BO MI PD KARLSRUHE X DFN X WDM Manno e2e lightpath from CNAF (Bologna, Italy) to Karlsruhe (Germany) The logical topology built for the e2e monitoring system abstracts the internal topology of each domain and produces a simpler topology. LHC-OPN e2e Monitoring
Connect. Communicate. Collaborate 35 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 Domain1 Domain2 Domain3, Domain4 Domain5 EP Demarcation PointDP End Point ID Link Domain Link Other Domain Links ID Link GARR SWITCH CNAF X BO MI PD KARLSRUHE X DFN X WDM Manno LHC-OPN e2e Monitoring
Connect. Communicate. Collaborate 36 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 Network 2 Network 1 Network n Domain 1 MP Domain 2 MP Domain n MP Domain 1 MA Domain 2 MA Domain n MA E2E Monitoring System Use r web services script polls acquisition domain aggregation and xml generation interdomain aggregation LHC-OPN e2e Monitoring
Connect. Communicate. Collaborate 37 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 GEANT2 GINS (the GARR network monitoring system) checks the status of the logical circuits in the GARR domain and provides the result to the GARR MP. The central e2e measurement system queries each domain and provides the global e2e status. This shows the domain independency, the possibility to easily aggregate the information and its scalability. GARR end point IP Link X X IL MONITORING GARR monitoring domain CNAF GINS e2e Monitor XML Data GARR MP E2E MS MPLS LSP IP/L2 Link CNAF - CERN GARR monitoring flow
Connect. Communicate. Collaborate 38 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 IP MPLS lambda GARR SWITCH CNAF X BO MI PD KARLSRUHE DFN WDM Manno X X lambda GINS e2e Service check the status of segments GINS User GINS User E2E Monitoring System (status aggregation) GARR monitoring domain
Connect. Communicate. Collaborate 39 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 GARR User Interface
Connect. Communicate. Collaborate 40 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 VISUALIZZAZIONE CNAF - CERN E2E MS user interface
Connect. Communicate. Collaborate 41 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 VISUALIZZAZIONE (Slides from Marco Marletta, Giovanni Cesaroni GARR) CNAF - CERN GARR GINS user interface
Connect. Communicate. Collaborate 42 The Italian Research and Education Network Hopi Meeting 3 Nov 2006 Measurement System Future work - wish list Define & implement “degraded” link status Add scheduled maintenance indication Add more detail to data model –Break down DomainLink into constituent parts? (e.g. OCh trails) –use more info from equipment
43