31 October 2005 1 The ongoing evolution from Packet based networks to Hybrid Networks in Research & Education Networks Olivier Martin, CERN Swiss ICT Task.

Slides:



Advertisements
Similar presentations
NORDUnet 25 years Hans Wallberg SUNET University of Umeå The Future of NORDUnet.
Advertisements

Storage System Integration with High Performance Networks Jon Bakken and Don Petravick FNAL.
M A Wajid Tanveer Infrastructure M A Wajid Tanveer
Business Model Concepts for Dynamically Provisioned Optical Networks Tal Lavian DWDM RAM DWDM RAM Defense Advanced Research Projects Agency.
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
Optical communications & networking - an Overview
Rationale for GLIF November CA*net 4 Update >Network is now 3 x 10Gbps wavelengths – Cost of wavelengths dropping dramatically – 3 rd wavelength.
16 September The ongoing evolution from Packet based networks to Hybrid Networks in Research & Education Networks Olivier Martin, CERN NEC’2005.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
RIT Campus Data Network. General Network Statistics Over 23,000 wired outlets Over 14,500 active switched ethernet ports > 250 network closets > 1,000.
1 Circuit Switching in the Core OpenArch April 5 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
1 6/19/ :50 CS57510 Gigabit Ethernet1 Rivier College CS575: Advanced LANs 10 Gigabit Ethernet.
Ch. 28 Q and A IS 333 Spring Q1 Q: What is network latency? 1.Changes in delay and duration of the changes 2.time required to transfer data across.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
May 2001GRNET GRNET2 Designing The Optical Internet of Greece: A case study Magda Chatzaki Dimitrios K. Kalogeras Nassos Papakostas Stelios Sartzetakis.
Is Lambda Switching Likely for Applications? Tom Lehman USC/Information Sciences Institute December 2001.
Why is optical networking interesting? Cees de Laat
NORDUnet NORDUnet The Fibre Generation Lars Fischer CTO NORDUnet.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
Brierley 1 Module 4 Module 4 Introduction to LAN Switching.
Impact of “application empowered” networks >The semi-conductor revolution reduced CAPEX and OPEX costs for main frame computer >But its biggest impact.
J. Bunn, D. Nae, H. Newman, S. Ravot, X. Su, Y. Xia California Institute of Technology High speed WAN data transfers for science Session Recent Results.
J. Bunn, D. Nae, H. Newman, S. Ravot, X. Su, Y. Xia California Institute of Technology State of the art in the use of long distance network International.
Valentino Cavalli Workshop, Bad Nauheim, June Ways and means of seeing the light Technical opportunities and problems of optical networking.
Communication Networks Fourth Meeting. Types of Networks  What is a circuit network?  Two people are connected and allocated them their own physical.
Gigabit Ethernet.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
CA*net 4 International Grid Testbed Tel:
High-quality Internet for higher education and research GigaPort  Overview SURFnet6 Niels den Otter SURFnet EVN-NREN Meeting Amsterdam October 12, 2005.
Update on CA*net 4 Network
TELE202 Lecture 5 Packet switching in WAN 1 Lecturer Dr Z. Huang Overview ¥Last Lectures »C programming »Source: ¥This Lecture »Packet switching in Wide.
LambdaStation Monalisa DoE PI meeting September 30, 2005 Sylvain Ravot.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Delivering Circuit Services to Researchers: The HOPI Testbed Rick Summerhill Director, Network Research, Architecture, and Technologies, Internet2 Joint.
Copyright AARNet Massive Data Transfers George McLaughlin Mark Prior AARNet.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
Parallel TCP Bill Allcock Argonne National Laboratory.
GrangeNet Dr. Greg Wickham APAN NOC 25 August 2005.
WHAT IS NETWORKING?. Networking is … How computers talk to each other.
© 2006 National Institute of Informatics 1 Jun Matsukata National Institute of Informatics SINET3: The Next Generation SINET July 19, 2006.
Erik Radius Manager Network Services SURFnet, The Netherlands Joint Techs Workshop Columbus, OH - July 20, 2004 GigaPort Next Generation Network & SURFnet6.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Prospects for the use of remote real time computing over long distances in the ATLAS Trigger/DAQ system R. W. Dobinson (CERN), J. Hansen (NBI), K. Korcyl.
Advances Toward Economic and Efficient Terabit LANs and WANs Cees de Laat Advanced Internet Research Group (AIRG) University of Amsterdam.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
. Large internetworks can consist of the following three distinct components:  Campus networks, which consist of locally connected users in a building.
Keeping up with the RONses Mark Johnson Internet2 Member Meeting May 3, 2005.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
30 June Wide Area Networking Performance Challenges Olivier Martin, CERN UK DTI visit.
GNEW2004 CERN March 2004 R. Hughes-Jones Manchester 1 Lessons Learned in Grid Networking or How do we get end-2-end performance to Real Users ? Richard.
Presented By: Gavin Worden Leased Lines vs. Internet Based VPNs.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
Challenges in the Next Generation Internet Xin Yuan Department of Computer Science Florida State University
CCNA3 Module 4 Brierley Module 4. CCNA3 Module 4 Brierley Topics LAN congestion and its effect on network performance Advantages of LAN segmentation in.
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
S. Ravot, J. Bunn, H. Newman, Y. Xia, D. Nae California Institute of Technology CHEP 2004 Network Session September 1, 2004 Breaking the 1 GByte/sec Barrier?
CSE 331: Introduction to Networks and Security Fall 2000 Instructor: Carl A. Gunter Slide Set 2.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
SURFnet6: the Dutch hybrid network initiative
R. Hughes-Jones Manchester
Networking between China and Europe
Transport Protocols over Circuits/VCs
The SURFnet Project Bram Peeters, Manager Network Services
Wide Area Networking at SLAC, Feb ‘03
Computer Networks.
Optical communications & networking - an Overview
Optical Networking Activities in NetherLight
Presentation transcript:

31 October The ongoing evolution from Packet based networks to Hybrid Networks in Research & Education Networks Olivier Martin, CERN Swiss ICT Task Force (Fribourg)

31 October 2005 Swiss ICT Task Force Slide 2 Presentation Outline The demise of conventional packet based networks in the R&E community The advent of community managed dark fiber networks The Grid & its associated Wide Area Networking challenges « On-Demand Lambda Grids » Ethernet over SONET & new standards –WAN-PHY, GFP, VCAT/LCAS, G.709, OTN Disclaimer: The views expressed herein are not necessarily those of CERN, furthermore although I am formally a CERN staff member until July 31, 2006, I do not work for CERN any more since October 3, being on a pre-retirement program.

Olivier H. Martin (3)

31 October 2005 Swiss ICT Task Force Slide 4 System Capacity (Mbit/s) Year Optical DWDM Capacity Ethernet Internet Backbone T1 T3 OC-3c OC-12c OC-48c 10-GE Ethernet Fast Ethernet GigE OC-192c 135 Mbit/s 565 Mbit/s 1.7 Gbit/s OC-48c 10 Gbit/s Gbit/s 160   10 Gbit/s 32  10 Gbit/s 16  10 Gbit/s 8 10 Gbit/s 4 10 Gbit/s 2 I/0 Rates = Optical Wavelength Capacity OC-768c 40-GE

31 October 2005 Swiss ICT Task Force Slide 5 Some facts  Internet is everywhere  Ethernet is everywhere  The advent of next generation G.709 Optical Transport Networks is very unsure! hence one has to learn how to live best with existing network infrastructures, which may well explain all the “hype” about “on-demand” lambda Grids!  For the first time in the history of the Internet, the Commercial and the Research & Education Internet appear to follow different routes  Will they ever converge again?  Dark fiber based, customer owned long distance, networks are booming!  users are becoming their own Telecom Operators  Is it a good or a bad thing? (5 of 12)

Olivier H. Martin (6) Internet Backbone Speeds T1 Lines T3 lines OC3c OC12c IP/ ATM-VCs MBPS

Olivier H. Martin (7) Higher Speed, Lower cost, complexity and overhead High Speed IP Network Transport Trends B-ISDN IP Over SONET/SDH IP SONET/SDH Optical ATM SONET/SDH IP Optical IP Over Optical IP Optical IP Over ATM ATM SONET/SDH IP Optical Multiplexing, protection and management at every layer Signalling

Olivier H. Martin (8)

Olivier H. Martin (9)

October 12, 2001Intro to Grid Computing and Globus Toolkit™10 Network Exponentials l Network vs. computer performance –Computer speed doubles every 18 months –Network speed doubles every 9 months –Difference = order of magnitude per 5 years l 1986 to 2000 –Computers: x 500 –Networks: x 340,000 l 2001 to 2010 –Computers: x 60 –Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan- 2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

Know the user BW requirements # of users C A B A -> Lightweight users, browsing, mailing, home use B -> Business applications, multicast, streaming, VPN’s, mostly LAN C -> Special scientific applications, computing, data grids, virtual-presence ADSLGigE LAN (3 of 12) F(t)

What the user BW requirements Total BW C A B A -> Need full Internet routing, one to many B -> Need VPN services on/and full Internet routing, several to several C -> Need very fat pipes, limited multiple Virtual Organizations, few to few ADSLGigE LAN (4 of 12)

So what are the facts Costs of fat pipes (fibers) are one/third of equipment to light them up –Is what Lambda salesmen told Cees de Laat (University of Amsterdam & Surfnet) Costs of optical equipment 10% of switching 10 % of full routing equipment for same throughput –100 Byte 10 Gb/s -> 80 ns to look up in 100 Mbyte routing table (light speed from me to you on the back row!) Big sciences need fat pipes Bottom line: create a hybrid architecture which serves all users in one coherent and cost effective way (5 of 12)

Utilization trends Gbps Network Capacity Limit Jan 2005

Today’s hierarchical IP network University Region al National or Pan-National IP Network Other national networks NREN A NREN B NREN C NREN D

Tomorrow’s peer to peer IP network World University Region al Server World National DWDM Network NREN A NREN B NREN C NREN D Child Lightpaths Child Lightpaths

Creation of application VPNs Commodity Internet Bio-informatics Network University CERN University High Energy Physics Network eVLBI Network Dept Research Network Direct connect bypasses campus firewall

Production vs Research Campus Networks >Increasingly campuses are deploying parallel networks for high end users >Reduces costs by providing high end network capability to only those who need it >Limitations of campus firewall and border router are eliminated >Many issues in regards to security, back door routing, etc >Campus networks may follow same evolution as campus computing >Discipline specific networks being extended into the campus

UCLP intended for projects like National LambdaRail CAVEwave acquires a separate wavelength between Seattle and Chicago and wants to manage it as part of its network including add/drop, routing, partition etc NLR Condominium lambda network Original CAVEwave

GEANT2 POP Design

31 October 2005 Swiss ICT Task Force Slide 21 LHC Data Grid Hierarchy Tier 1 Tier2 Center Online System CERN 700k SI95 ~1 PB Disk; Tape Robot FNAL: 200k SI95; 600 TB IN2P3 Center INFN Center RAL Center Institute Institute ~0.25TIPS Workstations ~ MBytes/sec 2.5/10 Gbps 0.1–1 Gbps Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels Physics data cache ~PByte/sec 10 Gbps Tier2 Center ~2.5 Gbps Tier 0 +1 Tier 3 Tier 4 Tier2 Center Tier 2 Experiment CERN/Outside Resource Ratio ~1:2 Tier0/(  Tier1)/(  Tier2) ~1:1:1

31 October 2005 Swiss ICT Task Force Slide 22 Main Networking Challenges Fulfill the, yet unproven, assertion that the network can be « nearly » transparent to the Grid Deploy suitable Wide Area Network infrastructure ( Gb/s) Deploy suitable Local Area Network infrastructure (matching or exceeding that of the WAN) Seamless interconnection of LAN & WAN infrastructures firewall? End to End issues (transport protocols, PCs (Itanium, Xeon), 10GigE NICs (Intel, S2io), where are we today:  memory to memory: 7.5Gb/s (PCI bus limit)  memory to disk: 1.2MB (Windows 2003 server/NewiSys)  disk to disk: 400MB (Linux), 600MB (Windows)

31 October 2005 Swiss ICT Task Force Slide 23 Main TCP issues Does not scale to some environments  High speed, high latency  Noisy Unfair behaviour with respect to:  Round Trip Time (RTT  Frame size (MSS)  Access Bandwidth Widespread use of multiple streams in order to compensate for inherent TCP/IP limitations (e.g. Gridftp, BBftp):  Bandage rather than a cure New TCP/IP proposals in order to restore performance in single stream environments  Not clear if/when it will have a real impact  In the mean time there is an absolute requirement for backbones with: – Zero packet losses, – And no packet re-ordering  Which re-inforces the case for “lambda Grids”

31 October 2005 Swiss ICT Task Force Slide 24 TCP dynamics (10Gbps, 100ms RTT, 1500Bytes packets) Window size (W) = Bandwidth*Round Trip Time –Wbits = 10Gbps*100ms = 1Gb –Wpackets = 1Gb/(8*1500) = packets Standard Additive Increase Multiplicative Decrease (AIMD) mechanisms: –W=W/2 (halving the congestion window on loss event) –W=W + 1 (increasing congestion window by one packet every RTT) Time to recover from W/2 to W (congestion avoidance) at 1 packet per RTT: –RTT*Wp/2 = hour –In practice, 1 packet per 2 RTT because of delayed acks, i.e hour Packets per second: –RTT*Wpackets = 833’333 packets

31 October 2005 Swiss ICT Task Force Slide 25 Internet2 land speed record history (IPv4 & IPv6) period

31 October 2005 Swiss ICT Task Force Slide 26 Layer1/2/3 networking (1) Conventional layer 3 technology is no longer fashionable because of: –High associated costs, e.g. 200/300 KUSD for a 10G router interfaces –Implied use of shared backbones The use of layer 1 or layer 2 technology is very attractive because it helps to solve a number of problems, e.g. –1500 bytes Ethernet frame size (layer1) –Protocol transparency (layer1 & layer2) –Minimum functionality hence, in theory, much lower costs (layer1&2)

31 October 2005 Swiss ICT Task Force Slide 27 Layer1/2/3 networking (2) « 0n-demand Lambda Grids » are becoming very popular: Pros: circuit oriented model like the telephone network, hence no need for complex transport protocols Lower equipment costs (i.e. « in theory » a factor 2 or 3 per layer) the concept of a dedicated end to end light path is very elegant Cons: « End to end » still very loosely defined, i.e. site to site, cluster to cluster or really host to host Higher circuit costs, Scalability, Additional middleware to deal with circuit set up/tear down, etc Extending dynamic VLAN functionality is a potential nightmare!

31 October 2005 Swiss ICT Task Force Slide 28 « Lambda Grids » What does it mean? Clearly different things to different people, hence the apparently easy consensus! Conservatively, on demand « site to site » connectivity  Where is the innovation?  What does it solve in terms of transport protocols?  Where are the savings? Less interfaces needed (customer) but more standby/idle circuits needed (provider) Economics from the service provider vs the customer perspective? –Traditionally, switched services have been very expensive, »Usage vs flat charge »Break even, switches vs leased, few hours/day »Why would this change? In case there are no savings, why bother? More advanced, cluster to cluster  Implies even more active circuits in paralle  Is it realistic? Even more advanced, Host to Host or even « per flow »  All optical  Is it really realisitic?

31 October 2005 Swiss ICT Task Force Slide 29 Some Challenges Real bandwidth estimates given the chaotic nature of the requirements. End-end performance given the whole chain involved –(disk-bus-memory-bus-network-bus-memory-bus- disk) Provisioning over complex network infrastructures (GEANT, NREN’s etc) Cost model for options (packet+SLA’s, circuit switched etc) Consistent Performance (dealing with firewalls) Merging leading edge research with production networking

31 October 2005 Swiss ICT Task Force Slide 30 Tentative conclusions  There is a very clear trend towards community managed dark fiber networks  As a consequence National Research & Education Networks are evolving into Telecom Operators, is it right?  In the short term, almost certainly YES  In the longer term, probably NO In many countries, there is NO other way to have affordable access to multi-Gbit/s networks, therefore this is clearly the right move The Grid & its associated Wide Area Networking challenges « on-demand Lambda Grids » are, according to me, extremely doubtful! Ethernet over SONET & new standards will revolutionize the Internet  WAN-PHY (IEEE) has, according to me NO future!  However, GFP, VCAT/LCAS, G.709, OTN are very likely to have a very bright future.

Single TCP stream performance under periodic losses Loss rate =0.01%: è LAN BW utilization= 99% è WAN BW utilization=1.2% Bandwidth available = 1 Gbps  TCP throughput much more sensitive to packet loss in WANs than LANs  TCP’s congestion control algorithm (AIMD) is not well-suited to gigabit networks  The effect of packets loss can be disastrous  TCP is inefficient in high bandwidth*delay networks  The future performance-outlook for computational grids looks bad if we continue to rely solely on the widely-deployed TCP RENO

ResponsivenessPathBandwidth RTT (ms) MTU (Byte) Time to recover LAN 10 Gb/s ms Geneva–Chicago 10 Gb/s hr 32 min Geneva-Los Angeles 1 Gb/s min Geneva-Los Angeles 10 Gb/s hr 51 min Geneva-Los Angeles 10 Gb/s min Geneva-Los Angeles 10 Gb/s k (TSO) 5 min Geneva-Tokyo 1 Gb/s hr 04 min  Large MTU accelerates the growth of the window  Time to recover from a packet loss decreases with large MTU  Larger MTU reduces overhead per frames (saves CPU cycles, reduces the number of packets)  C. RTT 2. MSS 2 C : Capacity of the link  Time to recover from a single packet loss:

Single TCP stream between Caltech and CERN u Available (PCI-X) Bandwidth=8.5 Gbps u RTT=250ms (16’000 km) u 9000 Byte MTU u 15 min to increase throughput from 3 to 6 Gbps u Sending station:  Tyan S2882 motherboard, 2x Opteron 2.4 GHz, 2 GB DDR. u Receiving station:  CERN OpenLab:HP rx4640, 4x 1.5GHz Itanium-2, zx1 chipset, 8GB memory u Network adapter:  S2IO 10 GbE Burst of packet losses Single packet loss CPU load = 100%

High Throughput Disk to Disk Transfers: From 0.1 to 1GByte/sec  Server Hardware (Rather than Network) Bottlenecks:  Write/read and transmit tasks share the same limited resources: CPU, PCI-X bus, memory, IO chipset  PCI-X bus bandwidth: 8.5 Gbps [133MHz x 64 bit]  Link aggregation (802.3ad): Logical interface with two physical interfaces on two independent PCI-X buses.  LAN test: 11.1 Gbps (memory to memory) Performance in this range (from 100 MByte/sec up to 1 GByte/sec) is required to build a responsive Grid-based Processing and Analysis System for LHC

Transferring a TB from Caltech to CERN in 64-bit MS Windows  Latest disk to disk over 10Gbps WAN: 4.3 Gbits/sec (536 MB/sec) - 8 TCP streams from CERN to Caltech; 1TB file  3 Supermicro Marvell SATA disk controllers + 24 SATA 7200rpm SATA disks  Local Disk IO – 9.6 Gbits/sec (1.2 GBytes/sec read/write, with <20% CPU utilization)  S2io SR 10GE NIC  10 GE NIC – 7.5 Gbits/sec (memory-to-memory, with 52% CPU utilization)  2*10 GE NIC (802.3ad link aggregation) – 11.1 Gbits/sec (memory-to-memory)  Memory to Memory WAN data flow, and local Memory to Disk read/write flow, are not matched when combining the two operations  Quad Opteron AMD GHz processors with 3 AMD-8131 chipsets: 4 64-bit/133MHz PCI-X slots.  Interrupt Affinity Filter: allows a user to change the CPU-affinity of the interrupts in a system.  Overcome packet loss with re-connect logic.  Proposed Internet2 Terabyte File Transfer Benchmark