An Integrated Instrumentation Architecture for NGI Applications Ian Foster, Darcy Quesnel, Steven Tuecke Argonne National Laboratory The University of Chicago
DOE NGI Instrumentation Project “A Uniform Instrumentation, Event, and Adaptation Framework for Network-Aware Middleware and Advanced Network Applications” –With UIUC (Dan Reed, Ruth Aydt) –“Produce uniform notification and adaptation mechanisms, with the goal of catalyzing the development of both network-aware middleware and sophisticated network-aware applications”
Motivation l Environment incorporates multiple sensors –Sources of events relating to behavior of resources, middleware, and applications l Significant advantages to having uniform mechanisms for publishing/discovering sensors and for accessing sensor data –E.g., find all sensors for path A->B –Including historical data l Enables end-to-end, top-to-bottom, past- to-present analysis
Examples of Sensors l Network devices –E.g., routers l End system devices –E.g., computers, storage systems l Grid services –E.g., Globus HBM, Network Weather Service l Libraries –E.g., CAVERNsoft, MPI l Applications
For Example... H RRR H S (SNMP) S S S (netstat) S H/W App Libs Sys GRAMHBMNWS... SSSS MPICHglobus-ioCAVERNsoftDPSS S SSSS S
Three Project Components 1. Mechanisms for creating, publishing, discovering, and accessing sensors 2. Synthesis and analysis techniques for identifying qualitative behavior and trends in sensor data 3. Adaptation techniques that exploit sensor data to adjust middleware and application configurations to improve performance Argonne focus: (1) and (3); UIUC: (2), (3)
Current Approach l Use a directory service (LDAP) to register and publish event sources –Publish: source, type, contact [online, archive] –Discover: “find all event sources of type X” l Use NetLogger format for data l Develop sensor manager to handle publish, subscribe, archiving l Use SQL database as archive l Initial sensor set based on Globus libraries, applications, NetLogger-accessible devices
Initial Instrumentation Architecture LDAP Sensor Manager SQL Netarchive MySQL Publish (“netstat, host A, time T, contact X”) Subscribe Discover (“what event sources for route A to B?”) Application Events in NetLogger format Sensor Archive File
Sensor Manager l We are building a program which: –Archives sensor event streams –Redirects sensor event streams to clients using a publish/subscribe interface –Generates sensor event streams from archive, based on query language –Publishes interfaces and index to LDAP l Relation to other work –Superset of Netlogd (simple archiver) –Might exploit Netarchiver (MySQL indexing)
Archiving Events l How to archive sensor event streams? –SQL: Save each event as a record in an SQL database >Advantage: Rich query support –Netarchive: Save each event into file. Use SQL database to build index of file contents >Advantage: Performance and scale? l We will explore the use of SQL databases –Premise: Most sensors will not produce high volume event streams; hence optimize for simplicity and rich query support
Bandwidth/Latency ANL-NASA Ames NCSA Origin Nodes ANL CPU Load Bandwidth/Latency ANL-Indiana Applying Info Infrastructure to Instrumentation
Publishing & Discovering Sensors l Globus LDAP-based Metacomputing Directory Service (MDS) provides scalable, global infrastructure for publishing and discovering sensor managers –Sensors stream events to a sensor manager –Sensor manager publishes availability of streams into LDAP –Clients discover sensor managers from LDAP, and can subscribe to either current or archived sensor event streams directly from sensor managers
Initial Applications l Replica creation in “Data Grid” applications –Online and historical instrumentation for large data transfers (app, lib, network) –Involves DPSS, globus-io –Also application-level selection of replicas, based on sensor information l MPI-based video streaming (Karonis, Papka)
Security l Grid Security Infrastructure (GSI) will be used throughout, hence possible to say e.g. –“Manager M accepts only streams from sensors of user U” –“Manager N only publishes streams to clients of users A, B, C” l As a first step, we have augmented the Netlogger C client with GSI
Instrumentation Architecture Showing Actuators LDAP Sensor Manager SQL Netarchive MySQL File Publish DiscoverSubscribe Events Sensor Publish Discover Actuator Monitor Subscribe Sensor Events
Future Directions l XML –Netlogger is an ASCII based format –If you using ASCII, why not use XML? –XML database could be used for archive l Events –Performance related events should be just one part of a larger, integrated event system l Typing –Netlogger is weakly typed –Various advantages to strongly typed events
Future Directions (2): Publish/Subscribe for Sensors l In first version: –Netlogger based sensors stream events to manager –Manager publishes sensor availability to LDAP –Clients subscribe to sensor manager for events l In later version: –Sensor can publish existence to LDAP –Client can subscribe directly to sensor for events
Network Weather Service (R. Wolski et al., U.Tenn) l Scalable, fault tolerant system for –Real-time performance measurements –Predictions of future state l When installed on N hosts, delivers: –Network performance (<=N 2 via netperf) –Host cpu-load measurements (N) l We (USC/ISI crew) are working to integrate this into MDS; hopefully will eventually be consistent with approach described here (to be discussed)
Structure of NWS data in MDS (old) c=US o=ISIo=Globus nn= the Internet source: hn=source.isi.edu, o=ISI, c=US destination: hn=destination.anl.edu, o=ANL, c=US serviceProvider: NWS throughput: throughput_prediction: throughput_MSE: 0.95 latency: 5.3 latency_throughput: 6.1 latency_MSE: 0.04 hs=source.isi.edu to destination.anl.gov current_cpu: current_cpu_prediction: current_cpu_MSE: weighted_cpu weighted_cpu_prediction: weighted_cpu_MSE: hn=source.isi.edu N 2 Network performance entries for N hosts N sets of cpu info for N hosts...