© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice S 3 : A Scalable Sensing Service for Monitoring Large Networked Systems Mobile & Media Systems Lab, HP Labs Praveen Yalagandula Puneet Sharma Sujata Banerjee Sung-Ju Lee Sujoy Basu
211 June 2015 Motivation Traditional access to network state −Only network admins have access to probe router MIBs SNMP based data collection and analysis −On path sensing (e.g., TCP, RTP) Emerging distributed applications −Need comprehensive picture of the network in real time For reliability and better performance To meet QoS guarantees −Examples CHART: Next Generation Internet Control Plane Mobile Streaming Media Systems
311 June 2015 Client-end Daemon Overlay Link Physical Link P2P Overlay Network A B Adaptive routing under fine real-time control Constant, pervasive sensing of network conditions Rapid routing response to link failures and degradations CHART: Control for High-throughput Adaptive Resilient Transport GOAL: Improve end-to-end TCP/IP performance 10x under multiple communication link impairments DARPA funded joint project with UC Berkeley, Princeton, George Mason University, UC Santa Barbara, and Anagran 10% lossrate
411 June 2015 Mobile Streaming Media System client Streaming server A Streaming server B Automatic server selection depending on network conditions Imperceptible to end user New media services introduced mid- session, depending on need High cross traffic
511 June 2015 What does these applications need? Sensing feedback that is: −Responsive: Seconds instead of minutes −Scalable: Low overhead even for ubiquitous sensing −Shareable: Between different applications Between different components of an application −Robust: Adapt to infrastructure failures −Flexible and Extensible: Per-flow, per-application to meet multiple application requirements
611 June 2015 S 3 : A Scalable Sensing Service Goal: provide system state in real-time −Both individual network and node state Monitor actively and passively E2E but leverage network element info where possible Flexible interfaces −Configurable time scales to measure −Support complex queries To which node do I have large bandwidth? Which node with file “foo” is within 10ms latency? Share measurement info across applications −Eliminate redundant expensive measurements Scalable, Secure, and Reliable
711 June 2015 Outline Motivation for a sensing service S 3 : Scalable Sensing Service −Goals −Architecture Sensing Pods Backplane Scalable Inferencing Engines Deployment: All pair network metrics on PlanetLab Summary and Future work
811 June 2015 Sensor Pod Secure Web Interface Controller Latency Lossrate Bandwidth Load Capacity Memory Repositor y Configuration & Data API: query, control, and notification SNMP Agent
911 June 2015 Sensor Pods Web-Service (WS) enabled collection of sensors −Goal is to provide a flexible and extensible framework −Focus not on building new measurement tools Allows easy plugging of new sensors WS enables composition aggregate sensors −E.g., Spruce needs capacity, use PathRate Archival of sensing information
1011 June 2015 Backplane Sensing Information Management Backplane
1111 June 2015 Sensing Backplane Programmable middleware −Aggregate data from end-points −Distribute configurations to end-points −Configurable and self-managing Exploring SDIMS [Yalagandula et al. SIGCOMM’04] −DHT based information management middleware −Scalable: with both nodes and attributes −Flexible: supports different aggregation strategies −Reliable: self-configuration in the face of failures −Adaptive: efficiently handle dynamic load patterns
1211 June 2015 Scalable Inference Engines Difficult to collect complete sensing information −Large overhead for probing and data exchange −E2E properties require gathering information from multiple sensors −Measurement/Monitoring failures Inference based on incomplete information −Exploit properties such as triangular inequality e.g. latency −Bayesian techniques e.g. hop counts Prediction based on archived information −E.g. network weather service A coarse estimate may suffice for many applications
1311 June 2015 Inferencing Latency and Proximity estimation −Co-ordinates: e.g., GNP, Vivaldi −Landmark based: e.g., Netvigator, GNP, Lighthouse −Others: e.g., Meridian, ID Maps, King Lossrate −Subset Path Selection: NetQuest Bandwidth: Non additive, highly dynamic hard −Route sharing model: BRoute
1411 June 2015 S 3 : Architecture Sensor pods −Collection of sensors −Measure system state from a node’s view Backplane −Programmable fabric −Connects pods and aggregates measured system state Inference Engines −Infers O(n 2 ) E2E paths info by measuring few paths −Schedules measurements on pods −Aggregates data on backplane Applications
1511 June 2015 Outline Motivation for a sensing service S 3 : Scalable Sensing Service −Goals and Challenges −Architecture Sensing Pods Backplane Scalable Inferencing Deployment: All pair network metrics on PlanetLab Future work
1611 June 2015 Deployment: PlanetLab Early January 2006 All pair network metrics −Latency: Inferred by Netvigator −Lossrate: Measured using Tulip lossrate tool −Available Bandwidth: Measured using Spruce and PathChirp −Capacity: Measured using Pathrate Stats:~14GB raw data every day, ~1GB compressed
1711 June 2015 Screenshot: Hop by hop loss sensor
1811 June 2015 S 3 Screenshot
1911 June 2015 Network Visualizer – NOC on a Laptop
2011 June 2015 CHART: S 3 in Action
2111 June 2015 Ongoing work Security Model Admission control for S 3 −Limit network and node usage Efficient and accurate bandwidth inference Validation with more applications