An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

Slides:



Advertisements
Similar presentations
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Advertisements

Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
© ABB Group Jun-15 Evaluation of Real-Time Operating Systems for Xilinx MicroBlaze CPU Anders Rönnholm.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
Low Overhead Fault Tolerant Networking (in Myrinet)
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
Communications in ISTORE Dan Hettena. Communication Goals Goals: Fault tolerance through redundancy Tolerate any single hardware failure High bandwidth.
1 Network Packet Generator Characterization presentation Supervisor: Mony Orbach Presenting: Eugeney Ryzhyk, Igor Brevdo.
NPCSlli 1 DESIGN AND IMPLEMENTATION OF CONTENT SWITCH ON IXP1200EB Presenter: Longhua Li Committee Members: Dr. C. Edward Chow Dr. Jugal K. Kalita Dr.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
Students:Gilad Goldman Lior Kamran Supervisor:Mony Orbach Mid-Semester Presentation Spring 2005 Network Sniffer.
EE 4272Spring, 2003 Protocols & Architecture A Protocol Architecture is the layered structure of hardware & software that supports the exchange of data.
Data Communications Architecture Models. What is a Protocol? For two entities to communicate successfully, they must “speak the same language”. What is.
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
6/30/2015HY220: Ιάκωβος Μαυροειδής1 Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
Router Architectures An overview of router architectures.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Router Architectures An overview of router architectures.
Chapter 1: Overview Lecturer: Alias Mohd Telecommunications Department Faculty of Electrical Engineering UTM SET 4573: Data Communication and Switching.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Reconfigurable Hardware in Wearable Computing Nodes Christian Plessl 1 Rolf Enzler 2 Herbert Walder 1 Jan Beutel 1 Marco Platzner 1 Lothar Thiele 1 1 Computer.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
Presentation on Osi & TCP/IP MODEL
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Lecture 3 Review of Internet Protocols Transport Layer.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multi-core Environments G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech.
EEC4113 Data Communication & Multimedia System Chapter 1: Introduction by Muhazam Mustapha, September 2011.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Computer Security Workshops Networking 101. Reasons To Know Networking In Regard to Computer Security To understand the flow of information on the Internet.
Frank Lemke DPG Frühjahrstagung 2010 Time synchronization and measurements of a hierarchical DAQ network DPG Conference Bonn 2010 Session: HK 70.3 University.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
TCP Offload Through Connection Handoff Hyong-youb Kim and Scott Rixner Rice University April 20, 2006.
A record and replay mechanism using programmable network interface cards Laurent Lefèvre INRIA / LIP (UMR CNRS, INRIA, ENS, UCB)
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Protocols and Architecture Slide 1 Use of Standard Protocols.
1 Chapter 4. Protocols and the TCP/IP Suite Wen-Shyang Hwang KUAS EE.
Interconnection network network interface and a case study.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
Hardened IDS using IXP Didier Contis, Dr. Wenke Lee, Dr. David Schimmel Chris Clark, Jun Li, Chengai Lu, Weidong Shi, Ashley Thomas, Yi Zhang  Current.
Using Uncacheable Memory to Improve Unity Linux Performance
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Client-server communication Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Network Processing Systems Design
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Lec 2: Protocols.
Packet Sniffing.
Internetworking: Hardware/Software Interface
EEC4113 Data Communication & Multimedia System Chapter 1: Introduction by Muhazam Mustapha, July 2010.
Myrinet 2Gbps Networks (
NetPerL Seminar An Analysis of TCP Processing Overhead
ECE 671 – Lecture 8 Network Adapters.
Cluster Computers.
Presentation transcript:

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001

Required Processing Routing Switching Quality of Service Support Internet Security Provision Required Processing Power Application Complexity Instructions per Packet Motivation – Demanding Services TERENA 2001

Motivation – MIPS versus Bandwidth Trend TERENA 2001 Technological Progress / Time MIPS Performance / Bandwidth Processor Performance Evolution ~100%/18 month Available Bandwidth Hardware Support for Protocol Processing Acceleration Case Study: TCP

Assumptions & Preconditions -Restriction to Local Area Networks (e.g. Gigabit Ethernet) -High Bandwidth and Low Error Probability -Concentration on Host Implementations Project Overview TERENA 2001 Protocol Analysis TCP/IP Partitioning System Simulation Evaluation Optimisation Efficient OS Integration Prototype Variants Flexible Protocol Engine Domain Specific Methodology

Talk Outline  TCP Protocol Performance Evaluation  TCP Acceleration Approach  System Simulation Environment  Operating System Integration  Hardware Implementation Directions  Myrinet Implementation and Results  Conclusions and Outlook TERENA 2001

TCP Protocol Performance Evaluation TCP Software Implementation Structure Sources of Protocol Processing Overhead - Communication, Synchronisation - Operating System Call Overhead - Copy Operation - Classification: Per-Byte / Per-Packet Optimisation Opportunities - Interrupt Suppression - Zero Copy Mechanisms - User Level Networking - Checksum Offloading (e.g. Task Offload) - Extending frame sizes (e.g. Jumbo Frames) TERENA 2001 Network Driver IP TCP Socket Application

TCP Protocol Performance Evaluation Performance TCP versus Myrinet GM: - Throughput 335/967 Mbit/s (TCP/Myrinet) - Latency 81/29  s (TCP/Myrinet) - 100% CPU Utilisation - (RedHat Linux 6.2 / PIII 500 MHz) TERENA 2001

Goals? Software Implementation as a Foundation Achieve On Wire Compatibility Consider Different Target Architectures Develop Re-Useable Hardware Components Integration of High Level Tools System Wide Optimisation Efficient, Transparent Operating System Integration Domain Specific Methodology Flexible Protocol Engine TERENA 2001

TCP Acceleration Approach TCP SW Stack Complexity -General Purpose Protocol -Not Designed for High End Networking -Many Interdependent Algorithms -Often Modified, Adapted, Optimised -~ Lines C Approach -TCP Partitioning -> Fast Path Extraction -Hardware Support -> Acceleration -Operating System Bypass HW/SW Synchronisation -Initialisation, Termination/Error Transparent Integration -Socket Level Switch Network Driver IP TCP Socket Application PE TERENA 2001

Fast Path Protocol Processing TERENA 2001 TCP Send SenderReceiver TCP Recv Send Ack Network Connection Context TCP Send TCP Recv Send Ack Connection Context Data Ack Only for User Data Exchange No Connection Management No Error Recovery – Only Detection Complexity ~10% of SW Stack

Netserver Socket User Mode Linux Netserver Socket User Mode Linux System Simulation Environment TERENA 2001 Network Simulator Netperf Socket User Mode Linux CORBA Complex Communication System Real Applications, Operating System (User Mode Linux) Network Simulation – Error Injection Fast Path Implementation: Hardware/Software System Evaluation: Functionality & Performance ISA Simulator TCP Fast Path SW VHDL Simulator TCP Fast Path HW Evaluation

Fast Path Hardware Implementation Directions Embedded RISC Processor -LEON Sparc 33 MHz, INTEL StrongARM 200 MHz -OS: ucLinux, GNU C Environment  Intelligent Network Adapter (Myrinet) -RISC Core with User/Network Interface, DMA Engines -Control Program Modification, no Operating System Network Processor (INTEL IXP1200) -6 multithreaded microengines -Development: IXP Assembler, Simulator Specific Hardware -High Level FPGA Design Flow, XILINX Virtex -SYNOPSYS Protocol Compiler Software Hardware TERENA 2001

Myrinet Implementation Plattform TERENA 2001 LOCAL SRAM LANai 7 Host Interface Packet Interface RISC PCI Bridge DMA Controller Myrinet Link 64 bit 64 bit, 33 MHz1280 Mbit/s Technology - Packet-Communication and Switching Technology - High-Performance, Highly Reliable - System-Area Network, Cluster Interconnect Intelligent Network Adapter

TCP Fast Path/Myrinet Development Environment - Host SW GM (message passing), Firmware MCP – open source - GNU C Suite, no OS, one context only, no Interrupts Implementation - MCP: 4 Event Driven State Machines - Fast Path Integration within Network Send & Recv Code - Exploitation of Hardware Support for Checksum Computation - No specific Optimisations, Some Limitations TERENA 2001

TCP Fast Path / Myrinet Performance Results Performance - Test Setup: INTEL PIII/500MHz, Myrinet LAN Adapter, Linux OS - Netperf Benchmark Throughput/Delay - Throughput Peak: 967, 816, 333 Mbit/s (GM, Fast Path, TCP) - Delay Minimum: 16.5, 49, 81  s (GM, Fast Path, TCP) TERENA 2001

Summary & Outlook Integrated Architecture and Desing Flow for Protocol Processing Acceleration -TCP Partitioning -System Simulation Environment -Integration with existing SW TCP Stack & OS Prototype with Promising Performance Present Work: -Fast Path HW Implementation and SoC Integration Protocol Analysis TCP/IP Partitioning System Simulation Efficient OS Integration Prototype Variants Evaluation Optimisation Flexible Configurable Protocol Engine TERENA 2001