Performance Update “10 pounds of stuff in a 5 pound bag” Jeff Boote Senior Network Software Engineer Internet2 Martin Swany Assistant Professor University of Delaware
Overview Performance Measurement Goals and Vision Measurement Tools perfSONAR Transport Middleware
Goals Increase network awareness Reduce diagnostic costs Set user expectations accurately Reduce diagnostic costs Performance problems noticed early Performance problems addressed efficiently Network engineers can see & act outside their turf Transform application design Incorporate network intuition into application behavior
Vision: Performance Information is … Available People can find it (Discovery) “Community of trust” allows access across administrative domain boundaries Ubiquitous Widely deployed (Paths of interest covered) Reliable (Consistently configured correctly) Valuable Actionable (Analysis suggests course of action) Automatable (Applications act on data)
NDT 3.4.1 is current version Latest enhancements were related to administrator ability to analyze data using JAnalyze (Google summer of code project) Test points available at all Internet2 IP network router locations: ndt.POP.net.internet2.edu POP=losa,salt,hous,kans,chic,atla,newy,wash
OWAMP (One way latency data) 3.0c (RFC 4645 version) available now Maintenance mode Diagnostic test points available at all Internet2 IP Network router locations: owamp.POP.net.internet2.edu POP = losa,salt,hous,kans,chic,atla,newy,wash
BWCTL (Throughput tests) 1.2a is current version 1.3 in testing (new testers: nuttcp, thrulay) Diagnostic test points available at all Internet2 IP Network router locations: bwctl.POP.net.internet2.edu POP = losa,salt,hous,kans,chic,atla,newy,wash
NPToolKit Recent versions of Measurement tools installed and pre-configured Knoppix Live-CD bootable system Current Version: 1.9 http://e2epi.internet2.edu/network-performance-toolkit.html
What is perfSONAR A collaboration An architecture & a set of protocols Production network operators focused on designing and building tools that they will deploy and use on their networks to provide monitoring and diagnostic capabilites to themselves and their user comunities. An architecture & a set of protocols Web Services Architecture Protocols based on the Open Grid Forum Network Measurement Working Group Schemas Several interoperable software implementations Java & Perl A Deployed Measurement infrastructure
perfSONAR Collaboraters RNP ARNES BELNET CARNET CESNET CYNET DANTE DFN ESnet FCCN FERMI GARR GEANT GRNET HEAnet Internet2 ISTF POZNAN UNINETT University of Delaware Renater RedIRIS SLAC SWITCH SURFnet And anybody else I missed
perfSONAR Architecture Interoperable network measurement middleware: Modular Web services-based Decentralized Locally controlled Integrates: Network measurement tools Network measurement archives Discovery Authentication and authorization Data manipulation Resource protection Topology Based on: Open Grid Forum Network Measurement Working Group schema.
perfSONAR-PS Motivation Create separate implementation of perfSONAR standard Use same protocol/standards Proof of interoperability (strengthens the standard) Targeted for NOC deployments Lightweight Easy to deploy/manage (We were unable to convince our primary users to deploy Java services due to the complexity of dependencies)
perfSONAR-PS Beta Release (0.06) (1/21/08) Focus on development of major perfSONAR components LS - perfSONAR_PS::Services::LS::LS SNMP MA - perfSONAR_PS::Services::MA::SNMP Status MA - perfSONAR_PS::Services::MA::Status CircuitStatus MA - perfSONAR_PS::Services::MA::CircuitStatus Topology MA - perfSONAR_PS::Services::MA::Topology PingER (SLAC) * Not yet released OWAMP/BWCTL archive (perfSONARBUOY) Not released via CPAN
SNMP Measurement Archive Provide access to network performance data Utilization Errors Discards Numerous tools exist to collect passive measurements (via SNMP): MRTG Cacti Cricket Expose archives from RRD files
SNMP Measurement Archive Current Deployment: Internet2 Network ESnet Georgia Tech/SOX Fermilab
Joint effort between Fermi Lab and SLAC Pinger Based MP/MA Joint effort between Fermi Lab and SLAC Present views of historic Pinger data Expose interface to schedule live tests Built with perfSONAR-PS infrastructure
Link Status Measurement Archive Provide access to up/down status information about layer2 links Data stored in a SQL database Database schema allows for storing time ranges during which a link had a certain status Minimizes storage costs for rarely changing links Communication/Configuration via XML Target audience is network operators and users interested in obtaining the status of the links over which their data flows
Link Status Measurement Archive Collector Allows for the periodic collection of the status of one or more links Can use SNMP, Scripts or simply Constants Can store results directly into a database or into a remote Measurement Archive
Link Status Measurement Archive Visualization A perfSONAR-UI Plugin is available that can display a network and the status of its links Current Deployment Internet2 Network HOPI (in2p3 circuit) Planned Deployment SLAC
Circuit Status Measurement Archive An e2emon-compatible service Integrates with the Link Status MA to provide the information stored in MAs Can work with local MAs directly or with remote MAs Can use the Topology MA to obtain necessary information about nodes Can use a Lookup Service to lookup the MA containing information on each link Target audience is administrators who want to publish circuit status information to e2emon clients
Circuit Status Measurement Archive Visualization Any tool that is compatible with e2emon will work with this service Current Deployment Internet2 Network HOPI (in2p3 circuit) Planned Deployment SLAC
Topology Service Provides a queryable repository for obtaining topology information about a domain Can obtain the entire network Xquery interface allows the construction of complex queries about the network Topology is specified according to the schema in development in the OGF
Topology Service Planned Deployments Current Deployments Internet2 Internet2 DCN SLAC (PingER Topology Information)
perfSONAR Lookup Service Directory service of perfSONAR deployments Accept service registrations Handles queries for service location and capabilities and location of available data Manage the lifetimes of data and services to keep information up to date Web Service interface to XML Database Sleepycat XML Database Service Info/Data kept in native formats Draw away the complex query tasks from otherwise 'busy' services
Lookup Service Also XML based configuration/protocol Native storage/query mechanisms [Xpath/XQuery] Message format to exchange the data Targeted at single domain deployment Single instance to manage multiple services Client components and applications use the LS to find services perfSONAR-UI perfAdmin
Lookup Service Current Deployment: Planned Deployment: Internet2 (Ann Arbor) University of Delaware Planned Deployment: IU for Internet2 network and regionals International Partners
Distributed Lookup Service Federation of individual LS instances into a global system “Meta”-lookup phase allows a query to find the specific LS that has relevant information Or perhaps the relevant LSes that have said info The specific query is sent directly to the LS in question Recent active design and development
Distributed Lookup Service Service and measurement metadata is “summarized” for propagation to distant domains IP addresses in service and measurement metadata are compressed into network/netmask pairs in the same way that routes are advertised (CIDR-style) These summarized metadata elements are advertised to external “scopes” A “scope” is a set of LSes that are related by e.g. being in the same administrative domain (although multiple scopes within a single domain are possible)
Weather Maps - Internet2
Gmaps from SLAC
CNM from DFN
CNM from DFN
perfSONARUI from acad.bg
PerfsonarUI 1
PerfsonarUI 2
PerfsonarUI 3
Oscars Circuit plugin - Internet2
Oscars circuit plugin
E2Emon - Monitoring Circuits
E2Emon: Status of E2E link CERN-LHCOPN-FNAL-001 E2Emon generated view of the data for one OPN link [E2EMON]
Traceroute Visualizer Forward direction bandwidth utilization on application path from LBNL to INFN-Frascati (Italy) traffic shown as bars on those network device interfaces that have an associated MP services (the first 4 graphs are normalized to 2000 Mb/s, the last to 500 Mb/s) 1 ir1000gw (131.243.2.1) 2 er1kgw 3 lbl2-ge-lbnl.es.net 4 slacmr1-sdn-lblmr1.es.net (GRAPH OMITTED) 5 snv2mr1-slacmr1.es.net (GRAPH OMITTED) 6 snv2sdn1-snv2mr1.es.net 7 chislsdn1-oc192-snv2sdn1.es.net (GRAPH OMITTED) 8 chiccr1-chislsdn1.es.net 9 aofacr1-chicsdn1.es.net (GRAPH OMITTED) 10 esnet.rt1.nyc.us.geant2.net (NO DATA) 11 so-7-0-0.rt1.ams.nl.geant2.net (NO DATA) 12 so-6-2-0.rt1.fra.de.geant2.net (NO DATA) 13 so-6-2-0.rt1.gen.ch.geant2.net (NO DATA) 14 so-2-0-0.rt1.mil.it.geant2.net (NO DATA) 15 garr-gw.rt1.mil.it.geant2.net (NO DATA) 16 rt1-mi1-rt-mi2.mi2.garr.net 17 rt-mi2-rt-rm2.rm2.garr.net (GRAPH OMITTED) 18 rt-rm2-rc-fra.fra.garr.net (GRAPH OMITTED) 19 rc-fra-ru-lnf.fra.garr.net (GRAPH OMITTED) 20 21 www6.lnf.infn.it (193.206.84.223) 189.908 ms 189.596 ms 189.684 ms link capacity is also provided
Phoebus Motivation We’re addressing performance problems and easing adoption of DC network circuits by deploying intelligent network services like Phoebus in order to actively enable users to better leverage their network connectivity (and network investment) by consistently achieving maximum performance The Phoebus service seeks to bridge the E2E Performance Gap by providing end-users a seamless way to access new types of high performance networks like the Dynamic Circuit (DC) Network to maximize their application performance. Standard users aren’t network engineers and shouldn’t have to be 42
High Loss due to shared infrastructure The State of the Net High Latency due to distance This is the environment that a typical connection contends with High loss and high latency appear on click. TCP is negatively affected by the interaction of loss and round-trip. The TCP slide is the last slide. High Loss due to shared infrastructure
Phoebus in Action Phoebus is based on the concept of creating a unique data-moving “session” for each application Each time an application is run, specific adaptation points in the backbone – known as Phoebus Gateways - are utilized to determine the best , highest performance path For example, a file transfer application may traditionally use the IP network. Once the application is set in motion, Phoebus determines the best network path from end to end for this specific application which could include a combination of IP, DC or other future service. Since the intelligence is in the core of the network, Phoebus enables all types of applications to leverage improved network performance with little to no modification by the end-user The Phoebus model is applicable to future applications as well and may prove to be a factor in the evolution of data transport technology
The Phoebus Model Phoebus is a framework and protocol for high-performance networks Phoebus works to transparently split the end-to-end network path into distinct segments Adaptation points are typically chosen at the ingress and egress points of the backbone This minimizes the negative effects of high latency and packet loss on data transfer By localizing their effects By allocating dedicated resources to mitigate the issues
The Phoebus Model - Con’t Transparent adaptation for existing applications Perform well to nearest Phoebus Gateway and allow the system to do the rest No modification necessary for most applications The Phoebus system has the ability to optimize the performance with a variety of techniques and insights into the state of the network
Phoebus-Enabled DC Network End-to-End Session DC Network
Session Layer Protocol The Phoebus Session Protocol (PSP) can be used to manage a multi-layer connection PSP TCP Layer 2 (e.g. DCN) No TCP at all in the middle. We can do optimized protocols over circuits The key “glue” that pulls these different things together is the Phoebus Session Protocol
Enabling Applications Phoebus can be enabled on Linux systems with software Applications don’t need to be recompiled Windows support under investigation Alternatively, we can intercept certain traffic with a special host acting as a router No modifications to the users’ workstations
Phoebus - Future Deployment in nine router POPs over the next few months Simple file transfer tool Transparently use Phoebus/Dynamic Circuits Utilize Measurement Infrastructure Help find best routes, provide information about paths and achievable bandwidth Extension of Path Finding / Routing Authentication and Authorization
Protocols and Schema Documents Base network measurement schema OGF Network Measurement Working Group Topology Schema OGF Network Markup Language WG Includes Topology Network ID perfSONAR Protocol Documents perfSONAR Consortium
Schema/Protocol Developments The perfSONAR Topology schema is also used in the DCN control plane We’ve spent quite a bit of effort harmonizing these The obvious win is that we have the measurement system have immediate access to dynamic circuits The broader impact is that we’re approaching a unified network interaction model (UNIM)
Schema - Network Element Identifiers A scheme for identifying network elements Each network element gets a unique identifier This identifier will be included with any measurement associated with that element.
Network Element Identifiers Use Cases: A topology service can be used to find the identifier for a network element An LS could then be queried to find all measurements associated with that element Dynamic service path-finding can be integrated with ongoing measurements
Network Element Identifiers Identifiers use URN notation Prefixed with “urn:ogf:network:” Consists of name/value pairs separated by colons Possible field names: domain, node, port, link, path, network Set of rules defined for each field to keep identifiers compact and finite
Network Element Identifiers Examples urn:ogf:network:domain=Internet2.edu urn:ogf:network:domain=internet2.edu:node=packrat urn:ogf:network:domain=internet2.edu:node=rtr.seat:port=so-2%2F1%2F0.16 urn:ogf:network:domain=internet2.edu:node=rtr.seat:port=198.32.8.200 urn:ogf:network:domain=Internet2.edu:node=packrat:port=eth0:link=1 urn:ogf:network:domain=internet2.edu:link=WASH to ATLA OC192 urn:ogf:network:path=anna-11537-176
Distributed Systems Infrastructure perfSONAR, DCN Control Plane and Phoebus have similar system requirements Lookup and Topology Services comprise a generic Information Service that is useful to all these Network Services Authentication and Policy services are cross-cutting as well Rather than have silos of mission-specific functionality, we envision pervasive system components
Distributed Systems Infrastructure Synergies of information bases are obvious Multi-layer path-finding including current network state, available resources on a variety of layers It is a compelling vision to imagine a dynamic, reactive, visible service-rich network
Summary A rich set of tools are being developed To federate network monitoring and diagnostics To enable dynamic network resource allocations To leverage new network capabilities from an ‘end-user’ application (phoebus) A longer view toward an evolution of “in the network” services
Questions? Jeff Boote Martin Swany boote@internet2.edu swany@cis.udel.edu