– 1 – Update on GGF Measurement Activities Bruce Lowekamp The College of William and Mary
– 2 – My Research William and Mary Computer Science Department 14 Faculty 45 PhD students 60 Masters students lots of good undergraduates Research Remos: SNMP-based topology and utilization for distributed apps Wren: leveraging topology and passive measurements for scalable grid network performation measurement Optimistic grid computing: fine-grained apps on a grid GGF Network Measurements Working Group
– 3 – Passive Network Measurement When an application is running, use passive measurements. When not, use active probes. Controlled by monitoring system, knows when measurements needed. Conversion between measurements important.
– 4 – Importance of Topology in Grids No one rule governs performance Real users (and system owners) make bad choices Grid applications must optimize performance for these environments. Can we exploit topology knowledge for better measurements? application performance?
– 5 – Outline GGF’s perspective on network measurement GMA: Grid Monitoring Architecture DAMED: Top-N Events NMWG: Network Characteristics Hierarchy
– 6 – GGF Perspective Users of measurements Application designers Runtime system designers Many users, many environments Grid applications must be flexible, portable
– 7 – Information Portability Information must be portable Each AS/VO may pick its own measurement system Parts of network aren’t measured Different parameters Goal: Application runs, unaware of environment Information from multiple measurement systems Should not have to support 10 different performance models
– 8 – GGF Projects GMA: Grid Monitoring Architecture What components do we need? DAMED: Discovery And Monitoring Event Description What are the Top-N events we need to support RIGHT NOW? NMWG: Network Measurements What does “bandwidth” mean anyway? Components of this global information service Boils down to schemas and protocols
– 9 – Existing Pieces Many of these components already exist or are in progress: instrumentation tools Pablo (UIUC), NetLogger (LBNL), log4j (apache), web100, SNMP, etc. host and network sensors too many to list sensor management tools JAMM (LBNL) event publication service MDS (Globus), NWS (UCSB), R-GMA (RAL),CODE (NASA AMES), Remos(CMU) event archive service netarchd (LBNL), NWS (UCSB) event analysis and visualization tools lots, but most only work for specific types of events: oNetLogger nlv (LBNL), Probe (Stazi), Autopilot (UIUC), etc. BUT, all use different event formats and protocols! no interoperability
– 10 – Event Publication To handle potentially huge amounts of event data requires an event publication and subscription service that is: flexible highly scalable provides near real-time access to monitoring data The Global Grid Forum (GGF) ( has defined the “Grid Monitoring Architecture” (GMA), for this purpose. Several GMA implementations have started to appear A great deal of work remains to define standard event schemas and event dictionaries for the GMA.
– 11 – GMA Terminology and Architecture (Performance) Event: Typed collection of data with a specific structure Producer Interface: makes performance data (events) available Consumer Interface: receives performance data (events) Directory Service: supports information publication and discovery must be distributed and/or replicated
– 12 – DAMED WG Discovery And Monitoring Event Description Working Group Chairs Jennifer Schopf, ANL James Magowan, IBM Top-N Metrics
– 13 – DAMED Charter Define a basic set of monitoring event descriptions information (attributes) associated with a particular data element conventions for the representation of the value associated with it. Develop standard representations of the most widely used measurement values (the "top N".) Emergence of a set of conventions and recommendations that will ease the task of defining richer, domain-specific schemas Damed if we do Not everyone will be happy Damed if we don’t Never reach our goal of seamless interoperability of grids (one big grid e.g. internet)
– 14 – DAMED Terminology Events Event Target Event Type Event Name = Target.Type network.link delay.TCP network.link.delay.TCP
– 15 – Target Types Targets used in Top-N Events Host: IP Process: IP, PID Disk Partition: /home Network Link: IP {port},IP {port} Software: String Scheduler: IP, String Not necessarily hierarchical
– 16 – Event Types Top-N CPU Load System uptime Disk size Disk used TCP available bandwidth Ping RTT Traceroute number of hops Running software status Packet Loss Available Memory Host Architecture Host OS Physical Memory
– 17 –
– 18 – NMWG Network Measurements Working Group Chairs: Brian Tierney (LBNL) Bruce Lowekamp (W&M) Richard Hughes-Jones (Manchester) Goal: Portability of network measurements Steps: Define hierarchy of measurements Establish mapping of tools measurements Conversion between measurements of same type
– 19 – Characteristics Hierarchy Ultimate Goal: Portability of Measurements Many APIs Many tools Natural Grid Development Process More measurement systems More measurement tools More cooperation More shared deployed infrastructure Middleware must be able to determine what network performance information is measuring. How do we share measurement information without discouraging development of new APIs and tools?
– 20 – How the Nomenclature Helps Need to classify measurements What does it measure? Sometimes more important than how. Not necessarily a new schema Should be a good schema for network measurements Not all systems are/should be organized this way Can be used as annotation in any schema. Goal is an agreed-upon classification of measurements taken, to allow both current and future measurement methodologies to classify their observations to maximize their portability.
– 21 – Representing a Measurement A measurement is represented by two elements: Characteristic What is being measured. Bandwidth, latency, etc. Network Entity The part of the network described by the measurement Link, path, host, etc.
– 22 – Terminology Network Characteristics Intrinsic properties of a portion of the network that are related to its performance and reliability Measurement Methodologies Means and methods of measuring those characteristics Observation An instance of information obtained by applying a measurement methodology. Note on IETF IPPM RFC2330 Compatible where possible, but metrics means many things. Guiding principle: clear meanings, follow standards where defined.
– 23 – Network Characteristics “Intrinsic Property” Property itself, not an observation Unrelated to how measurement is made Not a particular number Packet Loss Fraction of traffic Loss patterns Traffic profile
– 24 – Measurement Methodology Technique for recording or estimating a characteristic Two approaches: Raw: measuring actual characteristic Derived: aggregate or estimate from other characteristics Round trip delay ping TCP transmit/ACK pair two one-way delay measurements link propagation and queue length data
– 25 – Observations Singleton Smallest possible observation Sample Several singletons together Statistical Derived from a sample by calculating a statistic Timestamps, and ranges, are issues with each observation
– 26 – Network Entities Attributes must be included. Nodes and paths can be physical or functional.
– 27 – Describing Topology Two different types of topology Physical: Actual links and nodes Functional: Derived closeness Attributes define the Path or Node Multiple Topologies are Superimposed over physical network
– 28 – Describing Topology Paths: Path data follows from source to destination Unidirectional in most cases Paths (including hops) may be made of components Nodes: Hosts and Internal nodes Physical and Functional graphs not disjoint at edges
– 29 – Characteristics Overview
– 30 – Relationship Between Measurements Can we develop systems that use whatever information is available? iperf pathload QoS support Need to be able to request measurement of particular characteristic, without regard to what sub-characteristic or tool is used to return the result. Convert loss pattern to loss rate. Traffic profile to utilization fraction.
– 31 – Characterization of Tools Goal of hierarchy is to make measurements portable. First step is to agree on what characteristic tools measure. Some tools measure multiple characteristics, depending on parameters. Many lists of tools, including E2EPI, our goal is to annotate these lists and produce hierarchy with multiple views.
– 32 – NMWG Upcoming Work Taxonomy is nice, but exchanging real data requires a schema, with values for attributes and parameters. Two steps: Map tools to taxonomy Produce schema Schema step is needed to reach goal of portability. Participants including DAMED members.
– 33 – Summary of GGF Activities Focus on two aspects: System interoperability Measurement portability GMA completed DAMED finishing up Top-N documents NMWG characteristics hierarchy near release Need schema to put components together Portions contributed by: Jennifer Schopf (ANL) James Magowan (IBM), Brian Tierney (LBNL), and Dan Gunter (LBNL)