Download presentation
Presentation is loading. Please wait.
Published byMitchell Cunningham Modified over 9 years ago
1
The Design and Demonstration of the UltraLight Network Testbed http://ultralight.caltech.edu Presented by Xun Su xsu@hep.caltech.edu GridNets 2006, Oct 2 nd, 2006
2
Long Term Trends in Network Traffic Volumes: 300-1000X/10Yrs SLAC Traffic ~400 Mbps; Growth in Steps (ESNet Limit): ~ 10X/4 Years. Summer ‘05: 2x10 Gbps links: one for production, one for R&D Projected: ~2 Terabits/s by ~2014 W. Johnston L. Cottrell Progress in Steps 10 Gbit/s TERABYTES Per Month 100 300 400 500 600 200 ESnet Accepted Traffic 1990 – 2005 Exponential Growth: Avg. +82%/Year for the Last 15 Years
3
Motivation Provide the network advances required to enable petabyte- scale analysis of globally distributed data. Current Grid-based infrastructures provide massive computing and storage resources, but are currently limited by their treatment of the network as an external, passive, and largely unmanaged resource. The mission of UltraLight is to: Develop and deploy prototype global services which broaden existing Grid computing systems by promoting the network as an actively managed component. Integrate and test UltraLight in Grid-based physics production and analysis systems currently under development in ATLAS and CMS. Engineer and operate a trans- and intercontinental optical network testbed for broader community
4
UltraLight Backbone The UltraLight testbed is a non-standard core network with dynamic links and varying bandwidth inter-connecting our nodes. The core of UltraLight is dynamically evolving as function of available resources on other backbones such as NLR, HOPI, Abilene and ESnet. The main resources for UltraLight: US LHCnet (IP, L2VPN, CCC) Abilene (IP, L2VPN) ESnet (IP, L2VPN) UltraScienceNet (L2) Cisco Research Wave (10 Gb Ethernet over NLR) NLR Layer 3 Service HOPI NLR waves (Ethernet; provisioned on demand) UltraLight nodes: Caltech, SLAC, FNAL, UF, UM, StarLight, CENIC PoP at LA, CERN, Seattle
5
UltraLight topology: point of presence
6
GOAL: Determine an effective mix of bandwidth-management techniques for this application-space, particularly: Best-effort and “scavenger” using “effective” protocols MPLS with QOS-enabled packet switching Dedicated paths provisioned with TL1 commands, GMPLS PLAN: Develop, Test the most cost-effective integrated combination of network technologies on our unique testbed: Exercise UltraLight applications on NLR, Abilene and campus networks, as well as LHCNet, and our international partners Deploy and systematically study ultrascale protocol stacks (such as FAST) addressing issues of performance & fairness Use MPLS/QoS and other forms of BW management, to optimize end-to-end performance among a set of virtualized disk servers Address “end-to-end” issues, including monitoring and end- hosts UltraLight Network Engineering
7
UltraLight: Effective Protocols The protocols used to reliably move data are a critical component of Physics “end-to-end” use of the network TCP is the most widely used protocol for reliable data transport, but is becoming ever more ineffective for higher and higher bandwidth-delay networks. UltraLight is exploring extensions to TCP (HSTCP, Westwood+, HTCP, FAST, MaxNet) designed to maintain fair-sharing of networks and, at the same time, to allow efficient, effective use of these networks.
8
FAST others Gigabit WAN 5x higher utilization Small delay FAST: 95% Reno: 19% Random packet loss 10x higher throughput Resilient to random loss FAST Protocol Comparisons
9
Optical Path Developments Emerging “light path” technologies are arriving: They can extend and augment existing grid computing infrastructures, currently focused on CPU/storage, to include the network as an integral Grid component. Those technologies seem to be the most effective way to offer network resource provisioning on- demand between end-systems. We are developing a multi-agent system for secure light path provisioning based on dynamic discovery of the topology in distributed networks (VINCI) We are working to further develop this distributed agent system and to provide integrated network services capable of efficiently using and coordinating shared, hybrid networks, improving the performance and throughput for data intensive grid applications. This includes services able to dynamically configure routers and to aggregate local traffic on dynamically created optical connections.
10
GMPLS Optical Path Provisioning Collaboration efforts between UltraLight and Enlightened Computing. Interconnecting Calient switches across the US for the purpose of unified GMPLS control plane. Control Plane: IPv4 connectivity between site for control messages Data Plane: Cisco Research wave: between LA and Starlight EnLIGHTened wave: between StarLight and MCNC Raleigh LONI wave: between Starlight and LSU Baton Rouge over LONI DWDM.
11
GMPLS Optical Path Network Diagram
12
Realtime end-to-end Network monitoring is essential for UltraLight. We need to understand our network infrastructure and track its performance both historically and in real-time to enable the network as a managed robust component of our infrastructure. Caltech’s MonALISA: http://monalisa.cern.ch SLAC’s IEPM: http://www-iepm.slac.stanford.edu/bw/ We have a new effort to push monitoring to the “ends” of the network: the hosts involved in providing services or user workstations. Monitoring for UltraLight
13
MonALISA UltraLight Repository The UL repository: http://monalisa-ul.caltech.edu:8080/
14
The Functionality of the VINCI System Layer 3 Layer 2 Layer 1 Site ASite BSite C MonALISA ML Agent MonALISA ML Agent MonALISA ML Agent ML proxy services Agent ROUTERS ETHERNET LAN-PHY or WAN-PHY DWDM FIBER Agent
15
SC|05 Global Lambdas for Particle Physics We previewed the global-scale data analysis of the LHC Era Using a realistic mixture of streams: Organized transfer of multi-TB event datasets; plus Numerous smaller flows of physics data that absorb the remaining capacity We used Twenty Two [*] 10 Gbps waves to carry bidirectional traffic between Fermilab, Caltech, SLAC, BNL, CERN and other partner Grid sites including: Michigan, Florida, Manchester, Rio de Janeiro (UERJ) and Sao Paulo (UNESP) in Brazil, Korea (KNU), and Japan (KEK) The analysis software suites are based on the Grid-enabled UltraLight Analysis Environment (UAE) developed at Caltech and Florida, as well as the bbcp and Xrootd applications from SLAC, and dcache/SRM from FNAL Monitored by Caltech’s MonALISA global monitoring and control system [*] 15 at the Caltech/CACR Booth and 7 at the FNAL/SLAC Booth
17
Switch and Server Interconnections at the Caltech Booth 15 10G Waves 64 10G Switch Ports: 2 Fully Populated Cisco 6509Es 43 Neterion 10 GbE NICs 70 nodes with 280 Cores 200 SATA Disks 40 Gbps (20 HBAs) to StorCloud Thursday - Sunday
18
Monitoring NLR, Abilene/HOPI, LHCNet, USNet, TeraGrid, PWave, SCInet, Gloriad, JGN2, WHREN, other Int’l R&E Nets, and 14000+ Grid Nodes at 250 Sites (250k Paramters) Simultaneously I. Legrand HEP at SC2005 Global Lambdas for Particle Physics
19
RESULTS 151 Gbps peak, 100+ Gbps of throughput sustained for hours: 475 Terabytes of physics data transported in < 24 hours 131 Gbps measured by SCInet BWC team on 17 of our waves Sustained rate of 100+ Gbps translates to > 1 Petayte per day Linux kernel optimized for TCP-based protocols, including Caltech’s FAST Surpassing our previous SC2004 BWC Record of 101 Gbps Global Lambdas for Particle Physics Caltech/CACR and FNAL/SLAC Booths
20
Above 100 Gbps for Hours
21
475 TBytes Transported in < 24 Hours Sustained Peak Projects to > 1 Petabyte Per Day
22
It was the first time: a struggle for the equipment and the team We will stabilize, package and more widely deploy these methods and tools in 2006
23
SC05 BWC Lessons Learned Take-aways from this Marathon exercise: An optimized Linux kernel (2.6.12 + FAST-TCP + NFSv4) for data transport; after 7 full kernel-build cycles in 4 days Scaling up SRM/gridftp to near 10 Gbps per wave, using Fermilab’s production clusters A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions Extensions of SLAC’s Xrootd, an optimized low-latency file access application for clusters, across the wide area Understanding of the limits of 10 Gbps-capable computer systems, network switches and interfaces under stress
24
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.