AWOCA2003 Data Reservoir: Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research Mary Inaba, Makoto Nakamura, Kei Hiraki University.

Slides:



Advertisements
Similar presentations
Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.
Advertisements

Helping TCP Work at Gbps Cheng Jin the FAST project at Caltech
Tuning and Evaluating TCP End-to-End Performance in LFN Networks P. Cimbál* Measurement was supported by Sven Ubik**
Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.
Fundamentals of Computer Networks ECE 478/578
Restricted Slow-Start for TCP William Allcock 1,2, Sanjay Hegde 3 and Rajkumar Kettimuthu 1,2 1 Argonne National Laboratory 2 The University of Chicago.
An Analysis of Bulk Data Movement Patterns in Large-scale Scientific Collaborations W. Wu, P. DeMar, A. Bobyshev Fermilab CHEP 2010, TAIPEI TAIWAN
Ahmed El-Hassany CISC856: CISC 856 TCP/IP and Upper Layer Protocols Slides adopted from: Injong Rhee, Lisong Xu.
Advanced Computer Networking Congestion Control for High Bandwidth-Delay Product Environments (XCP Algorithm) 1.
Congestion Control Tanenbaum 5.3, /12/2015Congestion Control (A Loss Based Technique: TCP)2 What? Why? Congestion occurs when –there is no reservation.
High speed TCP’s. Why high-speed TCP? Suppose that the bottleneck bandwidth is 10Gbps and RTT = 200ms. Bandwidth delay product is packets (1500.
ISCSI Performance in Integrated LAN/SAN Environment Li Yin U.C. Berkeley.
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Review on Networking Technologies Linda Wu (CMPT )
Reduced TCP Window Size for VoIP in Legacy LAN Environments Nikolaus Färber, Bernd Girod, Balaji Prabhakar.
Presented by Anshul Kantawala 1 Anshul Kantawala FAST TCP: From Theory to Experiments C. Jin, D. Wei, S. H. Low, G. Buhrmaster, J. Bunn, D. H. Choe, R.
Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:
Ch. 28 Q and A IS 333 Spring Q1 Q: What is network latency? 1.Changes in delay and duration of the changes 2.time required to transfer data across.
Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002.
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
Transport Layer3-1 Chapter 3 outline r 3.1 Transport-layer services r 3.2 Multiplexing and demultiplexing r 3.3 Connectionless transport: UDP r 3.4 Principles.
Experiences in Design and Implementation of a High Performance Transport Protocol Yunhong Gu, Xinwei Hong, and Robert L. Grossman National Center for Data.
Network Technologies & Principles 1 Communication Subsystem. Types of Network. Principles of Network. Distributed Protocols.
Brierley 1 Module 4 Module 4 Introduction to LAN Switching.
Computers Are Your Future Tenth Edition Chapter 8: Networks: Communicating & Sharing Resources Copyright © 2009 Pearson Education, Inc. Publishing as Prentice.
These materials are licensed under the Creative Commons Attribution-Noncommercial 3.0 Unported license (
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Maximizing End-to-End Network Performance Thomas Hacker University of Michigan October 26, 2001.
CS/EE 145A Congestion Control Netlab.caltech.edu/course.
Data GRID Activity in Japan Yoshiyuki WATASE KEK (High energy Accelerator Research Organization) Tsukuba, Japan
TCOM 509 – Internet Protocols (TCP/IP) Lecture 04_b Transport Protocols - TCP Instructor: Dr. Li-Chuan Chen Date: 09/22/2003 Based in part upon slides.
ECEN “Internet Protocols and Modeling” Course Materials: Papers, Reference Texts: Bertsekas/Gallager, Stuber, Stallings, etc Grading (Tentative):
Block1 Wrapping Your Nugget Around Distributed Processing.
Chapter 12 Transmission Control Protocol (TCP)
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
CS 164: Slide Set 2: Chapter 1 -- Introduction (continued).
HighSpeed TCP for High Bandwidth-Delay Product Networks Raj Kettimuthu.
Rate Control Rate control tunes the packet sending rate. No more than one packet can be sent during each packet sending period. Additive Increase: Every.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.
CS640: Introduction to Computer Networks Aditya Akella Lecture 15 TCP – III Reliability and Implementation Issues.
TCP transfers over high latency/bandwidth networks Internet2 Member Meeting HENP working group session April 9-11, 2003, Arlington T. Kelly, University.
Low Latency Adaptive Streaming over TCP Authors Ashvin Goel Charles Krasic Jonathan Walpole Presented By Sudeep Rege Sachin Edlabadkar.
UNIT 2 LESSON 5 CS PRINCIPLES. OBJECTIVES Students will be able to: Explain why protocols are necessary to overcome the underlying unreliability of the.
Performance Engineering E2EpiPEs and FastTCP Internet2 member meeting - Indianapolis World Telecom Geneva October 15, 2003
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
TCP transfers over high latency/bandwidth networks & Grid DT Measurements session PFLDnet February 3- 4, 2003 CERN, Geneva, Switzerland Sylvain Ravot
TCP Traffic Characteristics—Deep buffer Switch
Final EU Review - 24/03/2004 DataTAG is a project funded by the European Commission under contract IST Richard Hughes-Jones The University of.
Masaki Hirabaru (NICT) and Jin Tanaka (KDDI) Impact of Bottleneck Queue on Long Distant TCP Transfer August 25, 2005 NOC-Network Engineering Session Advanced.
CRISP WP18, High-speed data recording Krzysztof Wrona, European XFEL PSI, 18 March 2013.
Window Control Adjust transmission rate by changing Window Size
Operating System.
Efficient utilization of 40/100 Gbps long-distance network
Transport Protocols over Circuits/VCs
Introduction to Networks
Lecture 19 – TCP Performance
FAST TCP : From Theory to Experiments
CS640: Introduction to Computer Networks
Evaluation of Objectivity/AMS on the Wide Area Network
Lecture 6, Computer Networks (198:552)
Presentation transcript:

AWOCA2003 Data Reservoir: Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research Mary Inaba, Makoto Nakamura, Kei Hiraki University of Tokyo AWOCA 2003

Today’s Topic New infrastructure for data intensive scientific research Problems of using the Internet

AWOCA2003 One day, I was surprised One professor (Dept. of Astronomy) said Network is for and paper exchange. FEDEX is for REAL Data exchange. (They use DLT tapes, and airplanes)

AWOCA2003 Huge Data Producers AKEBONO Sattelite Radio Telescope in NOBEYAMA SUBARU telescope KAMIOKANDE (Novel Prize) High Energy Accelerator A lot of Data suggest a lot of scientific truth, by computation. Now, we can compute.  Data Intensive Research

AWOCA2003 Huge Data Transfer (inquiry to Profs.) Current State Data Transfer by DLT, EVERY WEEK. Expected Data Size in a few years 10GB/day for Satellite Data 50GB/day High Energy Accelerator 50PB tape archive for Earth Simulation Observatories are shared by many researchers, hence, NEED to bring data to Lab., somehow. Does Network help?

AWOCA2003 Super-SINET backbone BNC Tohoku Univ KEK, Tsukuba Univ Univ. Tokyo, NAO, NII, Titech, Waseda ISAS Kyoto Univ, Doshisha Univ Nagoya Univ, Okazaki Labs Osaka Univ Optical Cross-connect Hokkaido Univ Kyushu Univ Start 2002 Jan Network for Universities and Institute Combination of 10Gbps ordinary Line several 1Gbps Project Lines (physics, genome, Grid, etc.)

AWOCA2003 Currently It is not so easy to transfer HUGE data by fully utilizing bandwidth for long distance, Because, TCP/IP is popularly used, for TCP/IP latency is the problem. Disk I/O speed (50MB/sec) …

AWOCA2003 Recall HISOTRY Infrastructure for Scientific Research Projects Utilization of computing systems at the time –From the birth of a electronic computer Numerical computation ⇒ Tables 、 Equations ① Supercomputing(vector) ⇒ Simulation ② ③ Servers ⇒ Database 、 Data-mining 、 Genome ④ Internet ⇒ Information Exchange 、 Documentation ⑤ Scientific researchers always utilize top-end systems ①②③④ ⑤ EDSACCDC-6600CRAY-1SUN Fire G Switch

AWOCA2003 Frontier of Information Processing New transition period -- Balance of computing systems –Very high-speed network –Large scale disk storage New infrastructure for –Cluster computers Data Intensive Research CPU GFLOPS Memory GB Network Interface Gbps Remote Disks Local Disks

AWOCA2003 Research Projects with Data Reservoir

AWOCA2003 Data Reservoir Data Reservoir High latency Very high bandwidth Network Distribute Shared File (DSM like architecture) Cache Disks Local file accesses Physically addressed Parallel and Multi-stream transfer Local file accesses Basic Architecture

AWOCA2003 Very High-speed Network Data Reservoir Data analysis at University of Tokyo Belle Experiments CERN X-ray astronomy Satellite ASUKA SUBARU Telescope Nobeyama Radio Observatory ( VLBI) Nuclear experiments Data Reservoir Data Reservoir Local Accesses Distributed Shared files Data intensive scientific computation through SUPER-SINET Digital Sky Survey

AWOCA2003 Design Policy Modification of disk handler under VFS layer Direct access to raw device for efficient data transfer Multi-level striping for scalability Use of iSCSI protocol Local file accesses through LAN Global disk transfer through WAN Single file image File system transparency File System SCSI driver iSCSI driver iSCSI daemon SCSI driver(mid) SCSI Driver(low) sdsgst -sg- Application md (RAID) driver Data Server Disks

AWOCA2003 Disk Server Scientific Detectors User Programs IP Switch File Server Disk Server IP Switch File Server Disk Server 1 st level striping 2 nd level striping Disk access by iSCSI File accesses on Data Reservoir

AWOCA2003 Disk Server Scientific Detectors User Programs IP Switch File Server Disk Server IP Switch File Server Disk Server 1 st level striping 2 nd level striping Disk access by iSCSI File accesses on Data Reservoir User’s View

AWOCA2003 Scientific Detectors User Programs File Server IP Switch Disk Server iSCSI Bulk Transfer Global Network Global Data Transfer

AWOCA2003 IP TCP/UDP NFS System Call EXT2 Linux RAID iSCSI driver sd Driversg Driver Application Network Implementation(File Server)

AWOCA2003 IP TCP iSCSI daemon System Call iSCSI Driver sg Driver Application Layer dr Driver SCSI Driver Data Stripe Network Disk Implementation(Disk Server) Disk

AWOCA2003 Performance evaluation of Data Reservoir 1.Local experiment 1 Gbps model (basic performance) 2.40 km experiments 1 Gbps model 、 U. of ⇔ ISAS km experiments 1 Gbps model 26ms latency (Tokyo ⇔ Kyoto ⇔ Osaka ⇔ Sendai ⇔ Tokyo) High-quality network (SUPER-Sinet Grid project lines) 4.US-Japan experiments 1.1Gbps model 2.U. of Tokyo ⇔ Fujitsu Lab. America (Maryland, USA) 3.U. of Tokyo ⇔ Scinet (Maryland, USA) 5.10 Gbps experiments compare four different switch configuration 1.Extreme Summit 7i, Trunked 8 Gigabit Ethernets 2.RiverStone RS16000 Trunked 8 and BASE-SX 3.Foundry BigIron 10GBASE-LR modules 4.Extreme BlackDiamond Trunked BASE-SX 5.Foundry BigIron Trunked 2 10BASE-LR the bottleneck (8Gbps), Trunking 8 Gigabit Ethernets

AWOCA2003 Performance Comparison to ftp(40km) ftp ---- Optimal performance (minimum disk head movements) iSCSI – Queued operation iSCSI transfer is 55% faster than ftp on single TCP stream

AWOCA km experiment System 870 Mbps file transfer BW Univ. of Tokyo (CISCO 6509) ↓ 1G Ether (Super-SINET) Kyoto Univ (Extreme Black Diamond ) ↓ 1G Ether (Super-SINET) Osaka Univ. (CISCO 3508) ↓ 1G Ether (Super-SINET) Tohoku Univ. (Jumper fiber) ↓ 1G Ether (Super-SINET) Univ. of Tokyo (Extreme Summit 7i)

AWOCA2003 IBM IBM IBM I B M Univ. of Tokyo Tohoku Univ. (sendai) Kyoto Univ. Osaka Univ. 550mile 300mile 250mile IBM IBM 1000mile line GbE Network for 1600km experiments ・ Grid project networks of SUPER-Sinet ・ One-way latency 26ms

AWOCA *4*81*4*(2+2)1*4*41*2*81*2*(2+2)1*2*41*1*81*1*(2+2)1*1*4 Transfer Rate (Mbps) Transfer speed on 1600 km experiment Maximum bandwidth by SmartBits = 970 Mbps Overheads of headers ~ 5 % System configuration (file-servers * disk servers * disks/disk server)

AWOCA Gbps experiment Local connection of two 10Gbps models 10GBASE-LR or 8 to BASE-SX 24 disk servers + 6 file servers –Dell 1650, 1.26GHz PentiumIII× 2 1GB memory 、 ServerSet III HE-SL –NetGear GE NIC –Extreme Summit 7i (Trunking) –Extreme Black Diamond 6808 –Foundry Big Iron (10GBASE-LR) –RiverStone RS Gbps transfer BW

AWOCA2003 Performance on10Gbps model 300GBytes file transfer (iSCSI streams) 5% header loss due to TCP/IP, iSCSI 7% performance loss due to trunking Uneven use of disk servers 100GB file transfer in 2 minutes

AWOCA2003 US-Japan Experiments at SC2002 Bandwidth Challenge 92% Usage of Bandwidth using TCP/IP

AWOCA2003 Brief Explanation of TCP/IP

AWOCA2003 User’s View TCP Internet abcde Byte stream abcde TCP is PIPE Output Same Data In the same order Input Data

AWOCA2003 TCP’s View TCP Internet abcde Byte stream abcde Check all data has come? Re-order when arrival order is wrong Ask “re-send” when data misses. Speed Control

AWOCA2003 TCP’s feature Keep data until “Acknowledgement” arrives. Speed Control (Congestion Control) without knowing the state of routers. Use Buffer (Window), and when get ACK from receiver new data is moved to buffer Make Buffer (Window) small, when congestion is guessed to be occurred.

AWOCA2003 Window Size and Throughput Roughly speaking RTT: Round Trip Time Hence, Longer RTT needs Larger Window Size for same throughput. Throughput = Window Size / RTT

AWOCA2003 Congestion Control AIMD Additive Increase Multiplicative Decrease AIMD phase Doubled for every ACK (start phase) time Window Size Gradually accelerate once after congestion occurs, Rapidly slow-down, when congestion is expected.

AWOCA2003 Another Problem Denote “network with long latency and wide bandwidth” as LFN(Long Fat Pipe Network) LFN needs large window size, But, since increment is triggered by ACK. speed of increment is also SLOW. (LFN suffers, AIMD)

AWOCA2003 Network Environment The Bottle Neck (about 600Mbps) Note that 600Mbps < 1Gbps

AWOCA % using TCP/IP is good, but, still we have a PROBLEM Several Streams work after other streams finish

AWOCA2003 Fastest and slowest stream in the worst case The slowest 3 times slower Than the fastest. Even other streams finish Throughput did not recover Sequence Number Time

AWOCA2003 Hand-made Tools DR Gigabit Network Analyzer –Need accurate Time Stamp with 100ns accuracy –Dump full packets Comet Delay and Drop Pseudo Long Fat Pipe Network(LFN) Gigabit Ether a packet is sent every 12 μsec

AWOCA2003 Programmable NIC(Network Interface Card)

AWOCA2003 DR Giga Analyzer

AWOCA2003 Comet Delay and Drop

AWOCA2003 Unstable Throughput We examined Long Distance Data Transfer, throughput is 8Mbps to 120Mbps. (When we use Gigabit Ethernet Interface)

AWOCA2003 Fast Ethernet is very stable

AWOCA2003 Analysis of single stream. Number of packets with 200msec RTT

AWOCA2003 Packet Distribution Number of Packets Per msec Time(sec)

AWOCA2003 Packet Distribution of Fast Ethernet Number of Packets Per msec Time(sec)

AWOCA2003 Gigabit Ethernet interface v.s. Fast Ethernet interface Even, same “20Mbps”, Behavior of 20Mbps of Gigabit Ethernet Interface and 20Mbps of Fast Ethernet Interface Is completely different. Gigabit Ethernet is very bursty. Router might not like this.

AWOCA problems Once packets are sent burstly, router sometimes cannot bear. (Unlucky stream slow, lucky stream fast) Especially when bottleneck is under Gigabit. More than 80% of time, the sender do not send anything.

AWOCA2003 Problem of implementation 1Gbps speed, suppose ether packet 1500B, 1 packet should be sent every 12 μsec. On the other hand, UNIX Kernel Timer is 10msec.

AWOCA2003 IPG(Inter Packet GAP) Transmitter is always on, When no packet sent, idle state. Each Frame at least 12bytes IPG (IEEE 802.3) sender Tunable by e1000 driver, (8bytes – 1023 bytes)

AWOCA2003 IPG tuning for short distance IPG 8bytes IPG 1023 bytes Fast Ethernet 94.1Mbps 56.7Mbps Gigabit Ethernet 941Mbps 567Mbps Suppose Ether Frame is 1500bytes, 1508: 2523 is approximately 567: 94 1 These work theoretically. (Gigabit ether has been perfectly tuned already for short distance data transfer)

AWOCA2003 IPG tuning for Long Distance

AWOCA2003 MAX,MIN,Average, Standard Deviation of Throughput FastEther

AWOCA2003 Some patterns of throughput change

AWOCA2003 Detail (Slow Start Phase)

AWOCA2003 Packet Distribution

AWOCA2003 But They are like an ad-hoc patch. What is the essential Problem?

AWOCA2003 One big problem Good MODEL does not exist. Old type MODEL does not work well. such as queueing theory M/M/1 packt distribution Poisson Distribution Experiment says it is not good. Currently, simulation and using real network is the only way to check. (No Theoretical background)

AWOCA2003 What is the difference of telephone network? AUTONOMY

AWOCA2003 For Telephone network, Telephone Company knows, manages and controls whole network. End-node doesn’t have to do heavy job, such as congestion control.

AWOCA2003 Current Trend(?) Analyze NETWORK using Game Theory. Nash Equilibrium