End to end lightpaths for large file transfer over fast long-distance networks Jan 29, 2003 Bill St. Arnaud, Wade Hong, Geoff Hayward, Corrie Cost, Bryan.

Slides:



Advertisements
Similar presentations
Electronic Visualization Laboratory University of Illinois at Chicago EVL Optical Networking Research Oliver Yu Electronic Visualization Laboratory University.
Advertisements

CANARIE CA*net 4 Update CA*net 4 Design and OBGP documentation
Storage System Integration with High Performance Networks Jon Bakken and Don Petravick FNAL.
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
The International Grid Testbed: a 10 Gigabit Ethernet success story in memoriam Bob Dobinson GNEW 2004, Geneva Catalin Meirosu on behalf of the IGT collaboration.
Rationale for GLIF November CA*net 4 Update >Network is now 3 x 10Gbps wavelengths – Cost of wavelengths dropping dramatically – 3 rd wavelength.
1 In VINI Veritas: Realistic and Controlled Network Experimentation Jennifer Rexford with Andy Bavier, Nick Feamster, Mark Huang, and Larry Peterson
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
CANARIE “CA*net 4 Customer Empowered Networking” Tel:
Communications in ISTORE Dan Hettena. Communication Goals Goals: Fault tolerance through redundancy Tolerate any single hardware failure High bandwidth.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
Service Providers & Data Link & Physical layers Week 4 Lecture 1.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
CUNY (NSF Planing Meeting, 11/12/03, Virginia) Circuit-switched High-speed End-to-End Transport arcHitecture (CHEETAH) Cisco MSPP Connection to primary.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),
The Network Layer. Network Projects Must utilize sockets programming –Client and Server –Any platform Please submit one page proposal Can work individually.
Slide 1 What is a Computer Network? A computer network is a linked set of computer systems capable of sharing computer power and resources such as printers,
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
Why is optical networking interesting? Cees de Laat
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
Brierley 1 Module 4 Module 4 Introduction to LAN Switching.
Large File Transfer on 20,000 km - Between Korea and Switzerland Yusung Kim, Daewon Kim, Joonbok Lee, Kilnam Chon
CANARIE Web services architecture for management of customer owned optical networks
Impact of “application empowered” networks >The semi-conductor revolution reduced CAPEX and OPEX costs for main frame computer >But its biggest impact.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
UCLP Roadmap Bill St. Arnaud CANARIE Inc –
1 Liquid Software Larry Peterson Princeton University John Hartman University of Arizona
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Introduction Slide 1 A Communications Model Source: generates.
Intorduction to Lumentis
1 Next Few Classes Networking basics Protection & Security.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
CA*net 4 International Grid Testbed Tel:
Update on CA*net 4 Network
William Stallings Data and Computer Communications 7 th Edition Chapter 1 Data Communications and Networks Overview.
What is not and is User Controlled LightPaths (UCLP)? JT Vancouver 2005 Hervé Guy Monday
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
CA*net 4 Open Grid Services for Management of Optical Networks CENIC Workshop May 6, 2002
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Srihari Makineni & Ravi Iyer Communications Technology Lab
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Parallel TCP Bill Allcock Argonne National Laboratory.
Université d’Ottawa University of Ottawa UCLPv2. 2 Agenda UCLP objectives UCLPv2: Definitions and use cases UCLPv2: Users and privileges.
TRIUMF Site Report – HEPiX/HEPNT – NIKHEF, Amsterdam, May 19-23/2003 TRIUMF Site Report HEPiX/HEPNT NIKHEF, Amsterdam May 19-23/2003 Corrie Kost.
Prospects for the use of remote real time computing over long distances in the ATLAS Trigger/DAQ system R. W. Dobinson (CERN), J. Hansen (NBI), K. Korcyl.
1 Recommendations Now that 40 GbE has been adopted as part of the 802.3ba Task Force, there is a need to consider inter-switch links applications at 40.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
Networks, Grids and Service Oriented Architectures
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Chapter 11.4 END-TO-END ISSUES. Optical Internet Optical technology Protocol translates availability of gigabit bandwidth in user-perceived QoS.
Internet Technologies Mr. Grimming. Internet Applications File Transfer World Wide Web E-commerce Searches Voice over Internet Protocol Video over.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
User-Controlled E2E Lightpath Provisioning over CA*net 4 May 26, 2003 Lead Participant: University of Ottawa Participant: Communications Research Centre.
Background Computer System Architectures Computer System Software.
CSE 331: Introduction to Networks and Security Fall 2000 Instructor: Carl A. Gunter Slide Set 2.
GGF 17 - May, 11th 2006 FI-RG: Firewall Issues Overview Document update and discussion The “Firewall Issues Overview” document.
Grid Canada Testbed using HEP applications
DataTAG Project update
Wide Area Networking at SLAC, Feb ‘03
Chapter 15 – Part 2 Networks The Internal Operating System
Cost Effective Network Storage Solutions
Outline Overview of IP History of the Internet - 3-May-19
Detailed plan - UVA Dynamic circuit setup/release
ECE 671 – Lecture 8 Network Adapters.
Presentation transcript:

End to end lightpaths for large file transfer over fast long-distance networks Jan 29, 2003 Bill St. Arnaud, Wade Hong, Geoff Hayward, Corrie Cost, Bryan Caron, Steve MacDonald

Problem 1.TCP throughput over long fat pipes very susceptible to packet loss, MTU, TCP kernel, Buffer memory, AQM optimized for commodity Internet, etc 2.Packet loss can result from congestion, but also underlying BER – achieve a gigabit per second with TCP on a coast-to-coast path (rtt = 40 msec), with 1500 byte packets, the loss rate can not exceed 8.5x10^-8 packets – “End to end” BER for optical networks 10^-12 to 10^-15 which means packet loss rate of approximately 10^-8 to 10^-11 – The bigger the packet the greater the loss rate!!! 3.Cost of routers significantly greater than switches for 10 Gbps and higher (particularly for large number of lambdas) 4.Lots of challenges maintaining consistent router performance across multiple independent managed networks – MTU, auto-negotiating Ethernet, insufficient buffer memory 5.Require consistent and similar throughput for multiple sites to maintain coherency for grids and SANs and new “space” storage networks using erasure codes e.g. Oceanstore 6.For maximum throughput OS and kernel bypass may be required 7.Many commercial SAN/Grid products will only work with QoS network

Possible Solutions 1.For point to point large file transfer a number of possible techniques such as FAST, XCP, parallel TCP, UDP, etc – Very scalable and allows same process to be used for all sorts of file transfer from large to small – But will it address other link problems? 2.Datagram QoS is a possibility to guarantee bandwidth – But requires costly routers and no proven approach across independent managed networks (or campus) – Does not solve problem of MTU,link problems, etc 3.E2E lightpaths - all solutions are possible – Allows new TCP and non TCP file transfers – Allows parallel TCP with consistent skew on data striping – Allows protocols that support OS bypass, etc – Guarantees consistent throughput for distributed coherence and enables news concepts of storing large data sets in “space” – Uses much lower cost switches and bypasses routers

What are E2E lightpaths? >Customer controlled E2E lightpaths are not about optical networking – E2E lightpaths do not use GMPLS or ASON >The power of the Internet was that an overlay packet network controlled by end user and ISPs could be built on top of telco switched network – CA*net 4 is an optical overlay network on top of telco optical network where switching is controlled by end users >More akin to MAE-E “peermaker” but at a finer granularity – “Do you have an e2e lightpath for file transfer terminating at a given IX? Are you interested in peering with my e2e lightpath to enable big file transfer?” – Lightpath may be only from border router to border router >With OBGP can establish new BGP path that bypasses most (if not all) routers – Allows lower cost remote peering and transit – Allows e2e lightpaths for big file transfer

e2e Lightpaths Normal IP/BGP path x.x.x.1 y.y.y.1 OBGP path Only y.y.y.1 advertised to x.x.x.1 via OBGP path Only x.x.x.1 advertised to y.y.y.1 via OBGP path Optical “Peermaker” Application or end user controls peering of BGP optical paths for transfer of elephants!!!

CA*net 4 Halifax Edmonton Seattle Vancouver Winnipeg Quebec City Montreal Ottawa Chicago Halifax New York Regina Fredericton Charlottetown Victoria Windsor London Sudbury Thunder Bay Saskatoon Kamloops Buffalo Minneapolis Albany St. John's Calgary Toronto Hamilton Kingston CA*net 4 Node Possible Future Breakout Possible Future link or Option CA*net 4 OC192 Boston

CA*net 4 Architecture Principles >A network of point to point condominium wavelengths – Do not confuse with traditional optical solutions like GMPLS or ASON >Grid service architecture for user control and management of e2e lightpaths – Uses OGSA and Jini/JavaSpaces for end to end customer control >Owners of wavelengths determine topology and routing of their particular light paths >All wavelengths terminate at mini-IXs where owner can – add/drop STS channel or wavelength – cross connect to another condominium owner’s STS channels or wavelengths – Web serviced enabled “peermaker” >Condominium owner can recursively sub partition their wavelengths and give ownership to other entities >Wavelengths become objects complete with polymorphism, inheritance, classes, etc

So what did we do? >Demonstrate a manually provisioned “e2e” lightpath >Transfer 1TB of ATLAS MC data generated in Canada from TRIUMF to CERN > – 17,000 km byte MTU >10 Gbe NIC cards on servers >Demonstrated a number of techniques for large file transfer including bbftp and tsunami >Demonstrated wide area SAN using un-modified “out of the box” parallel TCP with bidirectional data rates of 5 to 6 Gbps

Comparative Results (TRIUMF to CERN) Transfer Program Transferred Average Max AvgPBMPS wuftp 100 MbE (1500 byte MTU) 600 MB 3.4 Mbps.0578 wuftp 10 GbE6442 MB 71 Mbps1.207 Iperf (10 streams)275 MB940 Mbps1136 Mbps Pftp (germany)600MB532 Mbps9.044 bbftp (10 streams)1.4 TB666 Mbps710 Mbps12.07 Tsunami - disk to disk0.5 TB700 Mbps825 Mbps Tusnami - disk to memory12 GB>1.024 Gbps17.408

TRIUMF to CERN Topology

Exceeding 1Gbit/sec … ( using tsunami)

Lessons Learned - 1 >10 GbE appears to be plug and play >channel bonding of two GbEs seems to work very well (on an unshared link!) >Linux software RAID faster than most conventional SCSI and IDE hardware RAID >more disk spindles is better – distributed across multiple controllers and I/O bridges >the larger files the better for thruput >very lucky, no hardware failures (50 drives)

Lessons Learned - 2 >unless programs are multi-threaded or the kernel permits process locking, dual CPUs will not give best performance – single 2.8 GHz likely to outperform dual 2.0 GHz, for a single purpose machine like our fileserver >more memory is better >concatenating, compressing and deleting files takes longer than transferring >never quote numbers when asked :)

Yotta Yotta “Optic Boom” >Uses parallel TCP with data stripping >Each TCP channel assigned to a given e2e lightpath >Allows for consistent skew on data stripping >Allows for consistent throughput and coherence for geographic distributed SANs >Allows synchronization of data sets across multiple servers ultimately leading to “space” storage

VANCOUVER OTTAWA CHICAGO 1 x GE loop-back on OC-24 8 x OC-12 (622Mb/s) Sustained Throughput ~11.1 Gbps Ave. Utilization = 93%

Local Configuration

What Next? >continue testing 10GbE – try the next iteration of the Intel 10GbE cards in back to back mode – bond multiple GbEs across CA*net 4 >try aggregating multiple sources to fill a 10 Gbps pipe across CA*net 4 >setup a multiple GbE testbed across CA*net 4 and beyond >drive much more than 1 Gbps from a single host

Further Investigations >Linux TCP/IP Network Stack Performance – Efficient copy routines (zero copy, copy routines, read copy update) >Stream Control Transmission Protocol >Scheduled Transfer Protocol – OS bypass and zero copy – RDMA over IP >Web 100, Net 100, DRS