Download presentation
Presentation is loading. Please wait.
Published byAdam Stevenson Modified over 9 years ago
1
Shawn McKee / University of Michigan USATLAS Tier1 & Tier2 Network Planning Meeting December 14, 2005 - BNL UltraLight Overview
2
The UltraLight Project UltraLight is A four year $2M NSF ITR funded by MPS. A four year $2M NSF ITR funded by MPS. Application driven Network R&D. Application driven Network R&D. A collaboration of BNL, Caltech, CERN, Florida, FIU, FNAL, Internet2, Michigan, MIT, SLAC. A collaboration of BNL, Caltech, CERN, Florida, FIU, FNAL, Internet2, Michigan, MIT, SLAC. Significant international participation: Brazil, Japan, Korea amongst many others. Significant international participation: Brazil, Japan, Korea amongst many others. Goal: Enable the network as a managed resource. Meta-Goal: Enable physics analysis and discoveries which could not otherwise be achieved.
3
UltraLight Backbone UltraLight has a non-standard core network with dynamic links and varying bandwidth inter-connecting our nodes. Optical Hybrid Global Network The core of UltraLight is dynamically evolving as function of available resources on other backbones such as NLR, HOPI, Abilene or ESnet. The main resources for UltraLight: LHCnet (IP, L2VPN, CCC) LHCnet (IP, L2VPN, CCC) Abilene (IP, L2VPN) Abilene (IP, L2VPN) ESnet (IP, L2VPN) ESnet (IP, L2VPN) Cisco NLR wave (Ethernet) Cisco NLR wave (Ethernet) Cisco Layer 3 10GE Network Cisco Layer 3 10GE Network HOPI NLR waves (Ethernet; provisioned on demand) HOPI NLR waves (Ethernet; provisioned on demand) UltraLight nodes: Caltech, SLAC, FNAL, UF, UM, StarLight, CENIC PoP at LA, CERN UltraLight nodes: Caltech, SLAC, FNAL, UF, UM, StarLight, CENIC PoP at LA, CERN
4
UltraLight Layer1/2 Connectivity UL Layer 1/2 network (Courtesy Dan Nae)
5
UltraLight Layer3 Connectivity Shown (courtesy of Dan Nae) is the current UltraLight Layer 3 connectivity as of Mid-October 2005
6
UltraLight Network Usage
7
UltraLight Sites UltraLight currently has 10 participating core sites (shown alphabetically) Details and diagrams for each site will be reported Tuesday during “Network” day SiteMonitorTypeStorage Out of Band BNLMonalisaOC48, 10 GE 06 TBDY CaltechMonalisa 10 GE 1 TB Y CERNMonalisaOC192 9 TB Y FIUMonalisa 10 GE TBDY FNAL? TBDY I2?MPLSL2VPNTBDY MITMonalisaOC48TBDY SLACMonalisaIEPM TBDY UF? 1 TB Y UMMonalisa 10 GE 9 TB Y
8
Implementation via “sharing” with HOPI/NLR Also LA-CHI Cisco/NLR Research Wave DOE UltraScienceNet Wave SNV-CHI (LambdaStation) Connectivity to FLR to be determined MIT involvement welcome, but unfunded AMPATH UERJ, USP UltraLight Network: PHASE I Plans for Phase I from Oct. 2004
9
Move toward multiple “lambdas” Bring in FLR, as well as BNL (and MIT) General comment: We are almost here! AMPATH UERJ, USP UltraLight Network: PHASE II
10
Move into production Optical switching fully enabled amongst primary sites Integrated international infrastructure Certainly reasonable sometime in the next few years… AMPATH UERJ, USP UltraLight Network: PHASE III
11
UltraLight envisions a 4 year program to deliver a new, high-performance, network-integrated infrastructure: Phase I will last 12 months and focus on deploying the initial network infrastructure and bringing up first services Phase II will last 18 months and concentrate on implementing all the needed services and extending the infrastructure to additional sites Phase III will complete UltraLight and last 18 months. The focus will be on a transition to production in support of LHC Physics (+ eVLBI Astronomy, + ?) Workplan/Phased Deployment We are HERE!
12
GOAL: Determine an effective mix of bandwidth-management techniques for this application-space, particularly: Best-effort/“scavenger” using effective ultrascale protocols Best-effort/“scavenger” using effective ultrascale protocols MPLS with QOS-enabled packet switching MPLS with QOS-enabled packet switching Dedicated paths arranged with TL1 commands, GMPLS Dedicated paths arranged with TL1 commands, GMPLS PLAN: Develop, Test the most cost-effective integrated combination of network technologies on our unique testbed: 1. Exercise UltraLight applications on NLR, Abilene and campus networks, as well as LHCNet, and our international partners Progressively enhance Abilene with QOS support to protect production traffic Progressively enhance Abilene with QOS support to protect production traffic Incorporate emerging NLR and RON-based lightpath and lambda facilities Incorporate emerging NLR and RON-based lightpath and lambda facilities 2. Deploy and systematically study ultrascale protocol stacks (such as FAST) addressing issues of performance & fairness 3. Use MPLS/QoS and other forms of BW management, and adjustments of optical paths, to optimize end-to-end performance among a set of virtualized disk servers UltraLight Network Engineering
13
UltraLight: Effective Protocols The protocols used to reliably move data are a critical component of Physics “end-to-end” use of the network TCP is the most widely used protocol for reliable data transport, but is becoming ever more ineffective for higher and higher bandwidth-delay networks. UltraLight is exploring extensions to TCP (HSTCP, Westwood+, HTCP, FAST) designed to maintain fair- sharing of networks and, at the same time, to allow efficient, effective use of these networks. Currently FAST is in our “UltraLight Kernel” (a customized 2.6.12-3 kernel). This was used in SC2005. We are planning to broadly deploy a related kernel with FAST. Longer term we can then continue with access to FAST, HS-TCP, Scalable TCP, BIC and others.
14
UltraLight Kernel Development Having a standard tuned kernel is very important for a number of UltraLight activities: 1. Breaking the 1 GB/sec disk-to-disk barrier 2. Exploring TCP congestion control protocols 3. Optimizing our capability for demos and performance The current kernel incorporates the latest FAST and Web100 patches over a 2.6.12-3 kernel and includes the latest RAID and 10GE NIC drivers. The UltraLight web page (http://www.ultralight.org ) has a Kernel page which provides the details off the Workgroup->Network page http://www.ultralight.org
15
MPLS/QoS for UltraLight UltraLight plans to explore the full range of end-to-end connections across the network, from best-effort, packet- switched through dedicated end-to-end light-paths. MPLS paths with QoS attributes fill a middle ground in this network space and allow fine-grained allocation of virtual pipes, sized to the needs of the application or user. UltraLight, in conjunction with the DoE/MICS funded TeraPaths effort, is working toward extensible solutions for implementing such capabilities in next generation networks TeraPaths Initial QoS test at BNL Terapaths URL: http://www.atlasgrid.bnl.gov/terapaths/http://www.atlasgrid.bnl.gov/terapaths/
16
Optical Path Plans Emerging “light path” technologies are becoming popular in the Grid community: They can extend and augment existing grid computing infrastructures, currently focused on CPU/storage, to include the network as an integral Grid component. They can extend and augment existing grid computing infrastructures, currently focused on CPU/storage, to include the network as an integral Grid component. Those technologies seem to be the most effective way to offer network resource provisioning on-demand between end-systems. Those technologies seem to be the most effective way to offer network resource provisioning on-demand between end-systems. A major capability we are developing in Ultralight is the ability to dynamically switch optical paths across the node, bypassing electronic equipment via a fiber cross connect. The ability to switch dynamically provides additional functionality and also models the more abstract case where switching is done between colors (ITU grid lambdas).
17
MonaLisa to Manage LightPaths Dedicated modules to monitor and control optical switches Used to control CALIENT switch @ CIT CALIENT switch @ CIT GLIMMERGLASS switch @ CERN GLIMMERGLASS switch @ CERN ML agent system Used to create global path Used to create global path Algorithm can be extended to include prioritisation and pre-allocation Algorithm can be extended to include prioritisation and pre-allocation
18
Network monitoring is essential for UltraLight. We need to understand our network infrastructure and track its performance both historically and in real-time to enable the network as a managed robust component of our overall infrastructure. There are two ongoing efforts we are leveraging to help provide us with the monitoring capability required: IEPM http://www-iepm.slac.stanford.edu/bw/ http://www-iepm.slac.stanford.edu/bw/ MonALISA http://monalisa.cern.ch http://monalisa.cern.ch We are also looking at new tools like PerfSonar which may help provide a monitoring infrastructure for UltraLight. Monitoring for UltraLight
19
MonALISA UltraLight Repository The UL repository: http://monalisa-ul.caltech.edu:8080/ http://monalisa-ul.caltech.edu:8080/
20
End-Systems performance Latest disk to disk over 10Gbps WAN: 4.3 Gbits/sec (536 MB/sec) - 8 TCP streams from CERN to Caltech; windows, 1TB file, 24 JBOD disks Quad Opteron AMD848 2.2GHz processors with 3 AMD-8131 chipsets: 4 64- bit/133MHz PCI-X slots. 3 Supermicro Marvell SATA disk controllers + 24 SATA 7200rpm SATA disks Local Disk IO – 9.6 Gbits/sec (1.2 GBytes/sec read/write, with <20% CPU utilization) Local Disk IO – 9.6 Gbits/sec (1.2 GBytes/sec read/write, with <20% CPU utilization) 10GE NIC 10 GE NIC – 7.5 Gbits/sec (memory-to-memory, with 52% CPU utilization) 10 GE NIC – 7.5 Gbits/sec (memory-to-memory, with 52% CPU utilization) 2*10 GE NIC (802.3ad link aggregation) – 11.1 Gbits/sec (memory-to- memory) 2*10 GE NIC (802.3ad link aggregation) – 11.1 Gbits/sec (memory-to- memory) Need PCI-Express, TCP offload engines Need PCI-Express, TCP offload engines Need 64 bit OS? Which architectures and hardware? Need 64 bit OS? Which architectures and hardware? Discussions are underway with 3Ware, Myricom and Supermicro to try to prototype viable servers capable of driving 10 GE networks in the WAN.
21
Global Services support management and co-scheduling of multiple resource types, and provide strategic recovery mechanisms from system failures Global Services support management and co-scheduling of multiple resource types, and provide strategic recovery mechanisms from system failures Schedule decisions based on CPU, I/O, Network capability and End- to-end task performance estimates, incl. loading effects Schedule decisions based on CPU, I/O, Network capability and End- to-end task performance estimates, incl. loading effects Decisions are constrained by local and global policies Decisions are constrained by local and global policies Implementation: Autodiscovering, multithreaded services, service- engines to schedule threads, making the system scalable and robust Implementation: Autodiscovering, multithreaded services, service- engines to schedule threads, making the system scalable and robust Global Services Consist of: Global Services Consist of: Network and System Resource Monitoring, to provide pervasive end-to-end resource monitoring info. to HLS Network Path Discovery and Construction Services, to provide network connections appropriate (sized/tuned) to the expected use Policy Based Job Planning Services, balancing policy, efficient resource use and acceptable turnaround time Task Execution Services, with job tracking user interfaces, incremental re-planning in case of partial incompletion These types of services are required to deliver a managed network. Work along these lines is planned for OSG and future proposals to NSF and DOE. These types of services are required to deliver a managed network. Work along these lines is planned for OSG and future proposals to NSF and DOE. UltraLight Global Services
22
UltraLight Application in 2008 Node1> fts –vvv –in mercury.ultralight.org:/data01/big/zmumu05687.root –out venus.ultralight.org:/mstore/events/data –prio 3 –deadline +2:50 –xsum FTS: Initiating file transfer setup… FTS: Remote host responds ready FTS: Contacting path discovery service PDS: Path discovery in progress… PDS:Path RTT 128.4 ms, best effort path bottleneck is 10 GE PDS:Path options found: PDS:Lightpath option exists end-to-end PDS:Virtual pipe option exists (partial) PDS:High-performance protocol capable end-systems exist FTS: Requested transfer 1.2 TB file transfer within 2 hours 50 minutes, priority 3 FTS: Remote host confirms available space for DN=smckee@ultralight.org DN=smckee@ultralight.org FTS: End-host agent contacted…parameters transferred EHA: Priority 3 request allowed for DN=smckee@ultralight.org DN=smckee@ultralight.org EHA: request scheduling details EHA: Lightpath prior scheduling (higher/same priority) precludes use EHA: Virtual pipe sizeable to 3 Gbps available for 1 hour starting in 52.4 minutes EHA: request monitoring prediction along path EHA: FAST-UL transfer expected to deliver 1.2 Gbps (+0.8/-0.4) averaged over next 2 hours 50 minutes
23
EHA: Virtual pipe (partial) expected to deliver 3 Gbps(+0/-0.3) during reservation; variance from unprotected section < 0.3 Gbps 95%CL EHA: Recommendation: begin transfer using FAST-UL using network identifier #5A-3C1. Connection will migrate to MPLS/QoS tunnel in 52.3 minutes. Estimated completion in 1 hour 22.78 minutes. FTS: Initiating transfer between mercury.ultralight.org and venus.ultralight.org using #5A-3C1 EHA: Transfer initiated…tracking at URL: fts://localhost/FTS/AE13FF132-FAFE39A-44- 5A-3C1 EHA: Reservation placed for MPLS/QoS connection along partial path: 3Gbps beginning in 52.2 minutes: duration 60 minutes EHA: Reservation confirmed, rescode #9FA-39AF2E, note: unprotected network section included. FTS: Transfer proceeding, average 1.1 Gbps, 431.3 GB transferred EHA: Connecting to reservation: tunnel complete, traffic marking initiated EHA: Virtual pipe active: current rate 2.98 Gbps, estimated completion in 34.35 minutes FTS: Transfer complete, signaling EHA on #5A-3C1 EHA: Transfer complete received…hold for xsum confirmation FTS: Remote checksum processing initiated… FTS: Checksum verified—closing connection EHA: Connection #5A-3C1 completed…closing virtual pipe with 12.3 minutes remaining on reservation EHA: Resources freed. Transfer details uploading to monitoring node EHA: Request successfully completed, transferred 1.2 TB in 1 hour 41.3 minutes (transfer 1 hour 34.4 minutes)
24
Supercomputing 2005 The Supercomputing conference (SC05) in Seattle, Washington held another “Bandwidth Challenge” during the week of Nov 14-18 th A collaboration of high-energy physicists from Caltech, Michigan, Fermilab and SLAC (with help from BNL: thanks Frank and John!) won achieving 131 Gbps peak network usage. This SC2005 BWC entry from HEP was designed to preview the scale and complexity of data operations among many sites interconnected with many 10 Gbps links
26
Total Transfer in 24 hours
27
BWC Take Away Summary Our collaboration previewed the IT Challenges of the next generation science at the High Energy Physics Frontier (for the LHC and other major programs): LHC Petabyte-scale datasets Petabyte-scale datasets Tens of national and transoceanic links at 10 Gbps (and up) Tens of national and transoceanic links at 10 Gbps (and up) 100+ Gbps aggregate data transport sustained for hours; We reached a Petabyte/day transport rate for real physics data 100+ Gbps aggregate data transport sustained for hours; We reached a Petabyte/day transport rate for real physics data The team set the scale and learned to gauge the difficulty of the global networks and transport systems required for the LHC mission Set up, shook down and successfully ran the system in < 1 week Set up, shook down and successfully ran the system in < 1 week Substantive take-aways from this marathon exercise: An optimized Linux (2.6.12 + FAST + NFSv4) kernel for data transport; after 7 full kernel-build cycles in 4 days An optimized Linux (2.6.12 + FAST + NFSv4) kernel for data transport; after 7 full kernel-build cycles in 4 days A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions A newly optimized application-level copy program, bbcp, that matches the performance of iperf under some conditions Extensions of Xrootd, an optimized low-latency file access application for clusters, across the wide area Extensions of Xrootd, an optimized low-latency file access application for clusters, across the wide area Understanding of the limits of 10 Gbps-capable systems under stress Understanding of the limits of 10 Gbps-capable systems under stress
28
UltraLight and ATLAS UltraLight has deployed and instrumented an UltraLight network and made good progress toward defining and constructing a needed ‘managed network’ infrastructure. The developments in UltraLight are targeted at providing needed capabilities and infrastructure for LHC. We have some important activities which are ready for additional effort: Achieving 10GE disk-to-disk transfers using single servers Achieving 10GE disk-to-disk transfers using single servers Evaluating TCP congestion control protocols over UL links Evaluating TCP congestion control protocols over UL links Deploying embryonic network services to further the UL vision Deploying embryonic network services to further the UL vision Implementing some forms of MPLS/QoS and Optical Path control as part of standard UltraLight operation Implementing some forms of MPLS/QoS and Optical Path control as part of standard UltraLight operation Enabling automated end-host tuning and negotiation Enabling automated end-host tuning and negotiation We want to extend the footprint of UltraLight to include as many interested sites as possible to help insure its developments meet the LHC needs. Questions?
29
Michigan Setup for BWC
30
Effort at Michigan Michigan connected three wavelengths for SC2005 and was supported by the School of Information, ITCOM, CITI and the Medical School for this BWC. We were able to fill almost 30 Gbps during BWC
31
Details of Transfer Amounts Within 2 hours an aggregate of 95.37 TB (Terabyte) was transferred, with sustained transfer rates ranging from 90 Gbps to 150 Gbps and a measured peak of 151 Gbps. During the whole day (24 hours) on which the bandwidth challenge took place approximately 475 TB where transferred. This number (475 TB) is lower than the Caltech/SLAC/FNAL/Michigan led team was capable of as they did not always have exclusive access to waves, outside the bandwidth challenge time slot. If you multiply the 2 hours where 95.37 TB was transferred, times 12 (to represent a whole day) you get approximately 1.1 PB (Petabyte). Transferring this amount of data in 24 hours, is equivalent to a transfer rate of 3.8 (DVD) movies per second, assuming an average size of 3.5 GB per movie.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.