Office of Science MICS Division Department of Energy High-Performance Networking Research Program Program Manger: Thomas D. Ndousse Tel: Project Quad Charts
Bandwidth Estimation: Methodologies and Applications k claffy, CAIDA at SDSC & Constantinos Dovrolis, Univ. of Delaware Brief Summary of the Project Task 1: Develop accurate, fast, and non-intrusive bandwidth estimation (bwest) methodologies and measurement tools. Task 2: Compare and evaluate different bwest tools (both for end-to-end and per-hop bandwidth metrics), characterizing any observed errors. Task 3: Use bwest methodologies in transport protocols and applications to optimize throughput for high bandwidth-delay-product paths. Task 4: Prototype bwest middleware to monitor performance between network domains in real-time. 6/2/2015 2:23:51 AM High-Performance Network Research SciDAC Project MICS Program Manager: Thomas Ndousse Innovative end-to-end probing techniques to measure capacity (max possible throughput in empty path) and available bandwidth (max throughput under current load): – Packet Train Dispersion (PTD). – Variable Packet Size (VPS). – Self-Loading Periodic Streams (SLoPS) Methodologies to check for overbuffered or underbuffered network paths. Smooth pacing in TCP, driven by bwest measurements. Smooth bwest driven rate-control for UDP-based applications. The Novel Ideas Compare and evaluate existing bwest tools: - Hop-by-hop tool survey Jun01 - Aug02 - End-to-end tool survey Jun 01 - Jun02 Bandwidth measurement middleware - Create/maintain testbed Jun01 - Jun04 - Collect link characteristics Jun01 - Jun04 - Correlate active/passive measurements Jun01 - Jun04 Capacity estimation tool (pathrate) v2.1.2 Dec01 DONE - Add GUI to aid analysis of results Dec02 Available bandwidth tool (pathload) Mar02 - Paper at PAM’02 Mar02 Develop UDP-based rate-controlled file transfer app driven from bwest measurements Dec02 Real-time path monitor using bwest middleware Dec03 Milestones/Dates/Status IMPACT: Allow scientific applications (transferring terabytes of data) to efficiently use high-performance networks. – Use explicit bwest measurements instead of implicit bwest via TCP’s congestion control algorithms. – Provide easy-to-use tools for monitoring network path performance. CONNECTIONS: – Apply bwest methodologies to Web100 and Net100 projects. – Correlate bwest to loss/delay (e.g. PingER project) – Establish prototype bwest middleware in ESnet and for DOE labs and investigators. Impact and Connections
Security and Policy for Group Collaboration Steven Tuecke, Argonne National Laboratory, Carl Kesselman, USC Information Sciences Institute Miron Livny, U. Wisconsin, Madison Impact and Connections IMPACT: We expect this project to result in: Standardization of new PKI-based approaches to credential management, restricted delegation, policy management Development of security tools and services for collaboration Widespread deployment and adoption of approaches and tools CONNECTIONS: This work builds on the Globus Toolkit’s widely used Grid Security Infrastructure (GSI), and will be in future Globus Toolkit. To be used by numerous SciDAC collaboratories, including DOE Science Grid, Particle Physics Data Grid, Earth Systems Grid, and Fusion Collaboratory Also to be used by many non-DOE projects worldwide, including NSF PACI DTF, NASA IPG, and European Data Grid Milestones/Dates/Status Demonstrate CAS SC’01November 2001 Complete X.509 & GSS standards draftsFebruary 2002 Deliver draft standard conforming GSSApril 2002 Deliver CAS w/ simple policiesMay 2002 Demonstrate Online CA & CRSeptember 2002 Complete Online CA & CR standards draftsDecember 2002 Finalize X.509 & GSS standards February 2003 Deliver Online CA & CRMarch 2003 Deliver CAS w/ rich policy & app supportMay 2003 Finalize Online CA & CR standardsDecember 2003 Deliver standards-based Online CA & CRMarch 2004 Deliver CAS w/ accounting supportMay 2004 The Novel Ideas Enable collaborative work, with common security tools that address: - Large, geographically & organizationally distributed membership - Membership with diverse expertise, comprising different roles - Community resources with associated community policies Develop novel tools and approaches for: - Management of collaboration membership and resources - Online CA & Credential Repository (CR), local security integration - Management of roles and privileges - Community Authorization Service (CAS), restricted delegation - Integration into collaborative tools and environments September 2001 MICS Program Manager: Thomas Ndousse CAS 1. CAS request, with resource names and operations Does the collective policy authorize this request for this user? user/group membership resource/collective membership collective policy information Resource Is this request authorized for the CAS? Is this request authorized by the capability? local policy information 4. Resource reply User 3. Resource request, authenticated with capability 2. CAS reply, with and resource CA info capability Community Authorization Service High-Performance Network ResearchSciDAC Project
INCITE : Edge-based Traffic Processing and Inference for High-Performance Networks Richard Baraniuk, Rice University; Les Cottrell, SLAC; Wu-chun Feng, LANL Impact and Connections IMPACT: Optimize performance of demanding applications such as remote visualization and high-capacity data transfers New understanding of the complex dynamics of large-scale, high-speed networks New edge-based tools to characterize and map network performance as a function of space, time, application, protocol, and service CONNECTIONS: Rice/SLAC/LANL synergy, SciDAC Milestones Analysis, modeling, and inference Multifractal, wavelet, tomography theory ongoing Traffic analysis toolbox 12/02 Passive path inference and tomography algs 10/03 PingER Add tomography, chirping, fat boy 04/02 Port extended PingER to Rice/LANL 10/02 Add new inference algs to PingER-NG 06/03 Evaluate, port PingER-NG to GIMI/NMF 04/04 MAGNeT / TICKET MAGNeT, TICKET (alpha distribution) 10/02 High-speed, high-utilization traffic traces 09/02 MAGNeT (public availability) 06/03 INCITE Summary Task 1: Multiscale traffic analysis and modeling Task 2: Inference algorithms for network paths and links Task 3: Network tomography Task 4: Active network measurement: PingER Task 5: Passive network Measurement: MAGNeT, TICKET Task 6: Passive path monitoring and tomography toolkit Date Prepared: 10 Jan 02 High-Performance Network ResearchSciDAC Project MICS Program Manager: Thomas Ndousse incite.rice.edu INCITE Novel Ideas Multiscale / multifractal analysis for traffic bursts Efficient “packet chirp” and “fat boy” path probing Active and passive network tomography Monitor for Application-Generated Network Traffic (MAGNeT) Traffic Information Collecting Kernel with Exact Timing (TICKET) Augmented PingER
Logistical Networking PIs: Micah Beck, Jack Dongarra, James S. Plank / Tennessee; Rich Wolski / UCSB Impact and Connections IMPACT: Improved performance and scalability of data-intensive distributed application Greater ease of and lower cost of deployment of new wide area data management strategies Dramatically improved flexibility in data-intensive collaboration CONNECTIONS: SciDAC: Net100, Data Grid, Scalable Systems, Data Mgt, Computational Science (e.g. Climate, Supernovas) Base:Network Monitoring, Data Grid, Transport Protocols, Storage Res. Mgt., IQ-Echo, Milestones/Dates/Status –IBP applications demonstrated at SC’01 –exNode support in NetSolve –Reliability/performance coscheduling alpha –Allocation policy simulation – Initial generalized caching infrastructure –Initial logistical overlay network on ESNet –Wide-area logistical peering mechanisms and policies –Resolution for highly volatile storage resources –Experimental IBP architectures –Large scale measurement and simulations Novel Ideas Storage is too cheap to hoard. Storage can be a scalably shared network resource. Logistical Networking gives applications and middleware uniform control over buffering and routing of data. Data storage and data transport can be viewed as points on a spectrum of data management mechanisms. Monitoring and prediction can replace reservation as a means of scheduling storage resources. End-to-end networking principles can apply to storage. Logistical Networking: Developing a communicative infrastructure with persistence Tasks: -develop/deploy network storage depots -develop layered storage stack & tools -develop/validate scheduling techniques -optimize application performance loci.cs.utk.edu Date Prepared: 1/10/02 High-Performance Network Research SciDAC Project MICS Program Manager: Thomas Ndousse 6-12mos 12mos 12-18mos 18-36mos
Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL Impact and Connections IMPACT: increase throughput of bulk transfers over high delay, bandwidth networks (like DOE’s ESnet) select optimal paths and transport parameters for distributed (Grid) application (e.g.: GridFTP) provide network performance data base from active and passive monitoring CONNECTIONS: SciDAC: Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Logistical Networking Base:Network Monitoring, Data Grid, Transport Protocols Milestones/Dates/Status Network probes and sensors Mon/Yr DONE - initial sensor and tool deployment 12/01 12/01 - data base design 4/02 - initial data base implementation 9/02 - final sensor/data base 6/03 Transport protocol optimizations - protocol analysis 11/02 - initial tuning daemon 3/02 - bulk transfer tuning demos 8/02 - final tuning daemon 6/03 Multipath support - analytical analysis 8/02 - proof-of-principal routing daemons 12/02 - grid applications demos 4/03 Net100 Novel Ideas Net100 will tune network-UNaware applications based on recent and current link characteristics Net100 will tune more than just transport buffer sizes, such as TCP AIMD parameters DUP threshold Delayed ACK Net100 will determine optimal paths and whether to use multiple streams and/or multiple paths Net100 kernel utilizes passive monitoring from the Web100 kernel NET100: Developing network-aware operating systems Tasks: -develop/deploy network probes/sensors -develop network metrics data base -develop transport protocol optimizations -develop network-tuning daemon Date Prepared: 1/7/02 High-Performance Network Research - Base Project MICS Program Manager: Thomas Ndousse
Self-Configuring Network Monitor (SCNM) PIs: Brian Tierney/LBNL and Deb Agarwal/LBNL Impact and Connections IMPACT: Build a monitoring infrastructure that will aid in debugging of distributed application communication and support both active and passive monitoring CONNECTIONS: SciDAC: Net 100, DOE Science Grid, Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Net100 Base:Network Monitoring, Data Grid, Transport Protocols URL: www-itg.lbl.gov/Net-Mon/Self-Config.html Milestones/Dates/Status Monitor Daemon Year - Design base passive monitor daemon 1 - Activation mechanism integration 1 - Improvements to network drivers 1 - Improvements and enhancements to sensor mechanism 2 & 3 Activation Mechanisms - Design basic activation mechanism 1 - Develop and deploy full activation capabilities 2 & 3 Results Handling Infrastructure - TCP dump viewing capabilities 1 - Develop improved data viewing capabilities 2 & 3 Deployment of Monitors - Deployment to initial ESnet sites (gig-E) 1 – 3 - Work with applications 2 & 3 - Additional ESnet sites 2 & 3 Novel Ideas A secure monitoring infrastructure that applications can use to monitor performance of their own data streams Passive – introduce traffic only in the form of monitoring data and requests for monitoring Tasks Involved Develop a monitor activation mechanism Develop monitor software and hardware Develop data collection and display capabilities Deploy monitors Work with applications SCNM: Developing a distributed passive network monitoring system Date Prepared: 1/7/02 High-Performance Network Research Base Project MICS Program Manager: Thomas Ndousse
High-Performance Transport Protocols PI: Wu-chun (Wu) Feng, Los Alamos National Laboratory and The Ohio State University Impact and Connections IMPACT. Dynamic Right-Sizing - Auto-tuned, order-of-magnitude increase in throughput. - Vendor adoption, e.g., IRIX, Linux (still in the works) - Potential integration into GridFTP, Web100, Net100. RAPID - Sliding reliability semantics may result in adoption of RAPID by LANL large-data visualization team. CONNECTIONS. Dynamic Right-Sizing: Web100, Net100, DOE Science Grid, Particle Physics Data Grid, Earth System Grid II, RAPID: The LANL large-data visualization team, previously sponsored by the DOE NGI Corridor One project. Others? Milestones/Dates/Status Mon Yr DONE Simulation: Flow-Control Adaptation with Dynamic Right-Sizing -Protocol Analysis & Design (ns-2) 12/01 12/01 -Protocol Testing & Evaluation (rudimentary)03/02 beta testing Implementation: Flow-Control Adaptation with Dynamic Right-Sizing -Kernel Space, Linux 2.4.x 07/02 beta testing -User Space, drsFTP 01/03 alpha testing -Protocol Testing & Evaluation (rudimentary)03/03 -Potential Integration with GridFTP04/03 -Deployment (kernel- & user space)07/03 Simulation: RAPID -Effect of packet spacing03/02 preliminaries -Definition of API to middleware03/02 preliminaries - Sliding reliablity07/03 The Novel Ideas Dynamic Right-Sizing: TCP Flow-Control Adaptation for Grids & the Next-Generation Internet Automatically enhance network performance over the WAN by as much as an order of magnitude while abiding by TCP semantics. RAPID: Rate-Adjusting Protocol for Internet Delivery Provide smoother QoS support over the best-effort Internet for grids and NGI while minimizing the need for widespread deployment of DiffServ or IntServ. Goal: To significantly improve network performance in support of all computing environments, particularly grids and NGI. TCP/IP Make the network fast but TCP friendly. Eliminate TCP’s flow-control bottleneck by automatically tuning buffer sizes. RAPID Make the network more adaptable. Smooth QoS support over a best-effort network. User-settable reliability, providing a spectrum of QoS from unreliable UDP to reliable TCP. January 16, 2002 High-Performance Network Research - Base Project MICS Program Manager: Thomas Ndousse-Fetter
IQ-ECho PIs: Schwan, Ahamad, Eisenhauer, Yalamanchili -- Georgia Institute of Technology Impact and Connections IQ-ECho IMPACT. –enable network-aware adaptable applications –cross-layer information exchanges will make effective runtime tradeoffs in quality vs. performance across the protocol, middleware, and application levels –enable the creation of efficient and adaptable Grid data services CONNECTIONS: –Remote visualization (Supernova Visualization), source- based filtering (Oakridge), program monitoring and steering –Extensible cluster platforms (NSF, DOE) –Remote sensing, monitoring, and security (DARPA, NSF) Milestones/Dates/Status Year 1 Mon Yr DONE performance attributes in ECho middleware 4/02 select and implement sample application 6/02 create instrumentation for performance attributes 8/02 Year 2 evaluate and tune middleware 3/03 enable application for adaptation 3/03 extend/create configurable network protocols 6/03 Year 3 integrate ECho-IQ with access grid software 3/04 demonstrate benefits in access grid environment 6/04 IQ-ECho – Interactive Quality of Service Across Heterogeneous Hardware/Software Date Prepared: 1/10/02 High-Performance Network Research Base Project MICS Program Manager: Thomas Ndousse integrated QoS management through quality attributes dynamic code generation relocates application-level functionality to the most appropriate location configurable protocols and kernel-level monitoring provide the system-level support required for online quality management vertical programming allows extending platforms while programming applications IQ-ECho Novel Ideas represent information flows as event streams in event-based IQ-ECho middleware use dynamic code generation to migrate application-level filtering/ data processing to appropriate network locations use network-level feedback to drive application- level quality of service adaptations.
PingER PIs: Les Cottrell SLAC Impact and Connections IMPACT: increase network and Grid application bulk throughput over high delay, bandwidth networks (like DOE’s ESnet) provide trouble shooting information for networkers and users by identifying the onset and magnitude of performance changes, and whether they appear in the application or the network provide network performance data base, analysis and navigateable reports from active monitoring CONNECTIONS: SciDAC: High Energy Nuclear Physics, Bandwidth Estimation, Data Grid, INCITE Base:Network Monitoring, Data Grid, Transport Protocols Milestones/Dates/Status Infrastructure development Mon/YrDONE - develop simple window tuning tool08/0108/01 - initial infrastructure developed12/0112/01 - infrastructure installed at one site01/0201/02 - improve and extend infrastructure06/02 - deploy at 2 nd site08/02 - evaluate GIMI/DMF alternatives10/02 - extend deployment to PPDG sites03/03 Develop analysis/reporting tools - first version for standard apps02/02 Integrate new apps &net tools - GridFTP and demo05/05 - INCITE tools08/02 - BW measure tools (e.g. pathload)01/03 Compare & validate tools - GridFTP09/02 - BW tools 04/03 PingER novel ideas Low impact network performance measurements to most of the Internet connected world providing delays, loss and connectivity information over long time periods Network AND application high throughput performance measurements allowing comparisons, identification of bottlenecks Continuous, robust, measurement, analysis and web based reporting of results available world wide Simple infrastructure enabling rapid deployment, locating within an application host, and local site management to avoid security issues PingER: Active End-to-end performance monitoring for the Research and Education communities Tasks: - develop/deploy simple, robust ssh based active end-to-end measurement and management infrastructure -develop analysis/reporting tools -integrate new application and network measurement tools into the infrastructure -compare & validate various tools, and determine regions of applicability www-iepm.slac.stanford.edu Date Prepared: 1/7/02 High-Performance Network ResearchBase Project MICS Program Manager: Thomas Ndousse
Stability Modeling and Control of Transport Protocols for High-Speed Data Grids Nageswara S. Rao, Oak Ridge National Laboratory Impact and Connections IMPACT. Provides controlled end-to-end dynamics for grids over wide- area networks – significant step beyond state-of-the art Fundamentally new classes of transport methods based on sound analysis and experimentation – inexpensive and easy to use Provides the needed quality of service for control over wide- area networks for data and instrument grids CONNECTIONS: Net100 project: will use the proposed instruments and will provide certain measurement modules Terascale Supernova Initiative can significantly benefit from the proposed control methods – we are in communication Milestones/Dates/Status Detailed rigorous analysis: - attractor analysis Feb 02/Feb 03 - conditions of chaos Apr 02/Apr03 Grid network instrumentation design: - sufficiency proofs of measurements Mar 02/ Mar03 - detailed module design June 02 Proof of concept implementations: - high throughput July 02 - bounded higher order delay moments Aug 02/Sept 03 Application and testing: - identification of representative problem Feb 03 - performance study Sept 03 The Novel Ideas Detailed analysis of transport dynamics using non-linear control and chaos theory – showed that TCP generates “complicated” phase space attractors Developed the concept of grid network instruments to perform measurement and traffic engineering using light-weight in-situ modules – analytically showed their performance optimality Novel transport control methods for end-to-end control for - high throughput using concurrent window and graded control - controlled dynamics using multiple throttle methods Understand and Control the End-to-End Transport Dynamics of High-Speed Grids Detailed analysis of transport processes rigorous treatment using non-linear control and chaos theory Develop provably effective transport methods for: high throughput, and end-to-end dynamics control Implement and test on grid environments Date Prepared: 01/09/02 High-Performance Network Research Base Project MICS Program Manager: Thomas D. Ndousse
Pushing the Network Simulation Envelope W. R. Wing - Oak Ridge National Laboratory Impact and Connections IMPACT: SSFnet will be the first network simulator able to: Fully model SciDAC Terascale applications Allow SciDAC developers to tune their applications to evolving mixed-technology network environments Allow testing/confirmation of future SciDAC-developed network protocols CONNECTIONS: A key element of SSFnet’s verifiability is our plan to directly incorporate the Net100/Web100 MIB in the simulator. Comparison of real-life MIB measurements with the SSF-instrumented MIB will provide confirmation of SSFnet simulation fidelity. However, this does require deployment of at least some SciDAC applications on Web100/Net100 platforms Milestones/Dates/Status Proposed Milestone Proposed Date Actual Date Verify Shared-mem architectures - IBM, Compaq, Solaris Q1 - FY02 Complete Develop initial DM scheduler Q3 - FY02 Develop MIB instrumentation Q4 - FY02 Develop application-level IDE Q2 - FY03 Develop 2nd-Gen DML-based Scheduler Q4 - FY03 Distribute to DOE community Q4 - FY03 SSFnet Novel Ideas SSFnet will be the first network simulator with verifiable instrumentation - We plan to include (not model) the Net100/Web100 MIB - Net100/Web100 MIB data will be accumulated for direct comparison SSFnet will be the first production quality Distributed Memory simulator - Domain Modeling Language will automate decomposition SSFnet will be the first simulator able to tackle SciDAC-scale problems SSFnet - Creating a Terascale network simulator that can model SciDAC applications Tasks: - Verify SM SSFnet on candidate architectures - Develop initial DM version of SSFnet - Develop and verify instrumentation - Develop application-level IDE - Distribute to DOE network research community - Develop 2nd-Gen DM scheduler and DML Date Prepared 01/08/ 02 High-Performance Network ResearchBase Project MICS Program Manager: T. Ndousse
High-Performance Transport Protocols PI: Wu-chun (Wu) Feng, Los Alamos National Laboratory and The Ohio State University Impact and Connections IMPACT. Dynamic Right-Sizing - Auto-tuned, order-of-magnitude increase in throughput. - Vendor adoption, e.g., IRIX, Linux (still in the works) - Potential integration into GridFTP, Web100, Net100. RAPID - Sliding reliability semantics may result in adoption of RAPID by LANL large-data visualization team. CONNECTIONS. Dynamic Right-Sizing: Web100, Net100, DOE Science Grid, Particle Physics Data Grid, Earth System Grid II, RAPID: The LANL large-data visualization team, previously sponsored by the DOE NGI Corridor One project. Others? Milestones/Dates/Status Mon Yr DONE Simulation: Flow-Control Adaptation with Dynamic Right-Sizing -Protocol Analysis & Design (ns-2) 12/01 12/01 -Protocol Testing & Evaluation (rudimentary)03/02 beta testing Implementation: Flow-Control Adaptation with Dynamic Right-Sizing -Kernel Space, Linux 2.4.x 07/02 beta testing -User Space, drsFTP 01/03 alpha testing -Protocol Testing & Evaluation (rudimentary)03/03 -Potential Integration with GridFTP04/03 -Deployment (kernel- & user space)07/03 Simulation: RAPID -Effect of packet spacing03/02 preliminaries -Definition of API to middleware03/02 preliminaries - Sliding reliability07/03 The Novel Ideas Dynamic Right-Sizing: TCP Flow-Control Adaptation for Grids & the Next-Generation Internet Automatically enhance network performance over the WAN by as much as an order of magnitude while abiding by TCP semantics. RAPID: Rate-Adjusting Protocol for Internet Delivery Provide smoother QoS support over the best-effort Internet for grids and NGI while minimizing the need for widespread deployment of DiffServ or IntServ. Goal: To significantly improve network performance in support of all computing environments, particularly grids and NGI. TCP/IP Make the network fast but TCP friendly. Eliminate TCP’s flow-control bottleneck by automatically tuning buffer sizes. RAPID Make the network more adaptable. Smooth QoS support over a best-effort network. User-settable reliability, providing a spectrum of QoS from unreliable UDP to reliable TCP. January 16, 2002 High-Performance Network ResearchBase Project MICS Program Manager: Thomas Ndousse-Fetter