Gigabit Routing on a Software-exposed Tiled-Microprocessor

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Computer Networks20-1 Chapter 20. Network Layer: Internet Protocol 20.1 Internetworking 20.2 IPv IPv6.
The Raw Architecture Signal Processing on a Scalable Composable Computation Fabric David Wentzlaff, Michael Taylor, Jason Kim, Jason Miller, Fae Ghodrat,
THE RAW MICROPROCESSOR: A COMPUTATIONAL FABRIC FOR SOFTWARE CIRCUITS AND GENERAL- PURPOSE PROGRAMS Taylor, M.B.; Kim, J.; Miller, J.; Wentzlaff, D.; Ghodrat,
Presenter: Jeremy W. Webb Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Processor Architectures At A Glance: M.I.T.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
The Raw Processor: A Scalable 32 bit Fabric for General Purpose and Embedded Computing Presented at Hotchips 13 On August 21, 2001 by Michael Bedford Taylor.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Router Architecture : Building high-performance routers Ian Pratt
Jump to first page IP Switching and Gigabit Routers Shlomi Malki Nachman Cohen.
CS 268: Router Design Ion Stoica March 1, 2004.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.
ECE 526 – Network Processing Systems Design
A 50-Gb/s IP Router Authors: Craig Partridge et al. IEEE/ACM TON June 1998 Presenter: Srinivas R. Avasarala CS Dept., Purdue University.
Chapter 9 Classification And Forwarding. Outline.
Chapter 4 Network Layer slides are modified from J. Kurose & K. Ross CPE 400 / 600 Computer Communication Networks Lecture 15.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Chapter 4 Queuing, Datagrams, and Addressing
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Paper Review Building a Robust Software-based Router Using Network Processors.
HyperTransport™ Technology I/O Link Presentation by Mike Jonas.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
R OUTE P ACKETS, N OT W IRES : O N -C HIP I NTERCONNECTION N ETWORKS Veronica Eyo Sharvari Joshi.
Elastic-Buffer Flow-Control for On-Chip Networks
A 50-Gb/s IP Router 참고논문 : Craig Partridge et al. [ IEEE/ACM ToN, June 1998 ]
TILEmpower-Gx36 - Architecture overview & performance benchmarks – Presented by Younghyun Jo 2013/12/18.
Applied research laboratory David E. Taylor Users Guide: Fast IP Lookup (FIPL) in the FPX Gigabit Kits Workshop 1/2002.
Router Architecture Overview
Department of Computer and IT Engineering University of Kurdistan Computer Networks II Router Architecture By: Dr. Alireza Abdollahpouri.
George Michelogiannakis, Prof. William J. Dally Concurrent architecture & VLSI group Stanford University Elastic Buffer Flow Control for On-chip Networks.
1 Processing packets in packet switches CS343 May 7 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
CSE 661 PAPER PRESENTATION
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
Network on Chip - Architectures and Design Methodology Natt Thepayasuwan Rohit Pai.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Hot Interconnects TCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor David V. Schuehler
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
The Raw Architecture A Concrete Perspective Michael Bedford Taylor Raw Architecture Group Laboratory for Computer Science Massachusetts Institute of Technology.
Message Passing On Tightly- Interconnected Multi-Core Processors James Psota and Anant Agarwal MIT CSAIL.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
1 CSE 5346 Spring Network Simulator Project.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
1 IEX8175 RF Electronics Avo Ots telekommunikatsiooni õppetool, TTÜ raadio- ja sidetehnika inst.
Lecture Note on Switch Architectures. Function of Switch.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
Chapter 4 Network Layer All material copyright
Whirlwind Tour Of Lectures So Far
Packet Switching on Raw
Addressing: Router Design
Reference Router on NetFPGA 1G
Chapter 4: Network Layer
Packet Switch Architectures
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Network Processors for a 1 MHz Trigger-DAQ System
Project proposal: Questions to answer
Duo Liu, Bei Hua, Xianghui Hu, and Xinan Tang
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Chapter 4: Network Layer
Packet Switch Architectures
Presentation transcript:

Gigabit Routing on a Software-exposed Tiled-Microprocessor James W Anderson, Anthony Degangi, Anant Agarwal Umar Saif MIT Computer Science and AI Laboratory

Network Routers xKb/sec xGb/sec ~5 ports ~102 ports Network “Switch” Network “Processor”

Three Challenges Performance Architectural Scalability Programmability 5 -- 10Gb/sec (OC-192) Architectural Scalability Throughput: x2.2/year Port count: 10 -- 100 for edge routers Programmability Network Services: NAT, firewalls, VPN “Layer 7” switches Monitoring: Loss rate, link utilization, traffic patterns

Network Processors Conventional Wisdom Tiled “all-purpose” architectures

MIT RAW Microprocessor Tiled-architecture Low-latency mesh networks Software-exposed pins Compute Pipeline 8 32-bit channels 2 DOR dynamic networks Memory Dynamic(MDN) General Dynamic(GDN) 2 Static Networks Streaming Tile-Multicast 8 stage 32b MIPS-style single-issue in-order compute processor 4-stage 32b pipelined FPU 32 KB DCache 32 KB IMem Routers and wires for three on-chip mesh networks Registered at input  longest wire = length of tile

RAW Microprocessor RAW Network Routing Parallel processing Software-exposed tiled-architecture Software exposed Pins Software-exposed point-to-point networks Network Routing Parallel processing Flexible buffering Efficient, scalable switching

However .. Network Processors RAW Microprocessor Processing Switching Special-purpose hardware Software running on RAW general-purpose tiles Switching Special-purpose switching fabric RAW general-purpose on-chip networks Buffering Centrally-accessible specialized memory-controllers - dedicated interconnects External to the chip, connected to Software-exposed pins Accessed via RAW on-chip networks

IPv4 Router: RFC 1812 Look-up Header verification DIR-24-8-BASIC [Gupta98] Header verification TTL update, header re-compute Incremental Checksum [RFC 1141] Switch to destination

Evaluation Methodology Maximum Loss Free Forwarding Rate MLFFR Minimum-sized 64-byte packets Millions of packets per second (mpps) Maximum-sized 1500-byte packets: Gigabit/sec Captured Internet Trace: ~128 bytes Packet Latency RAW Clocked at 425 Mhz Comparison with IXP1200 as a reference point

RAW Router, Take 1: Parallelism Header Verify SRAM DRAM Packet Buffer Lookup tables Line Card Line Card Line Card Lookup 2 stage lookup Line Card Drain FIFO Line Card Line Card Line Card Line Card Packet Buffer Lookup tables SRAM DRAM Header recompute Interrupt Drain-tile

Flow of Packets L: Lookup V: Verify U: Update D: Drain L V U D Lookup DRAM Line Card Line Card L V U D Line Card Line Card Line Card Line Card Line Card Line Card Lookup DRAM

RAW Router, Take 1 Static Network for Streaming Packets Static MDN GDN SRAM DRAM Line Card Line Card Line Card Line Card Line Card Line Card Line Card Line Card Static Network for Streaming Packets Feed the pipeline Stream the payload to DRAM General Dynamic Network Header Forwarding 3 -> 4 Memory Dynamic Network From memory to line-card SRAM DRAM

Version I Performance 1.8 Gb/sec -- > 6.17Gb/sec 2.9 mpps -- > 6.23 mpps

Memory Dynamic Network RAW Router Version 1 Shared Buffering SRAM DRAM Bus Contention Line Card Line Card Line Card Line Card Line Card Line Card Line Card Line Card Memory Dynamic Network DOR: x --> y SRAM DRAM

RAW Router, Take 2: Buffering and Switching Line Card Line Card Line Card Line Card Drain FIFO Header recompute Interrupt Drain-tile SDRAM SDRAM Lookup 2 stage lookup SDRAM SDRAM Header Verify Lookup Lookup Line Card Line Card Line Card Line Card

RAW Router, Take 2 Static MDN GDN Line Card Line Card Line Card Line Respects DOR No “bus contention” for DMAs (bottleneck is shared SDRAMs) 2x Memory BW No need to look at packet length Dynamic networks for “out-of-band” communication GDN SDRAM SDRAM SDRAM SDRAM Lookup Lookup Line Card Line Card Line Card Line Card

Optimized buffering and switching 6.17 Gb/sec -- > 8.68Gb/sec 6.17 mpps -- > 6.77 mpps

RAW Router, take 3: Reducing Memory Transactions Streaming DDR No fragmentation of frames Line Card Line Card Line Card Line Card SDRAM DRAM Pipelined Memory Requests SDRAM SDRAM SDRAM SDRAM Line Card Line Card Line Card Line Card

Streaming packet buffers + 64-byte minimum buffering 8.68 Gb/sec -- > 9.57Gb/sec 6.77 mpps -- > 9.79 mpps

Buffering on Line-cards 9.57 Gb/sec -- > 15.03Gb/sec 9.79 mpps -- > 9.79mpps

All dynamic networks 9.57 Gb/sec -- > 8.50Gb/sec 9.57 mpps -- > 6.94 mpps

Evaluation with captured Trace

Packet Latency Router Packet size Cycles Time(ns) RAW null 64 416 177 RAW IPv4 690 293 1500 3490 1483 5394 2292

Conclusions Tiled-architectures = NPU performance + enhanced programmability RAW’s low-level software-control was vital for deriving performance: Layout of routing functions 30% improvement by altering layout Role and behavior of the on-chip networks 15% improvement by using GDN and static networks in place of MDN

Conclusions Network oblivious: 30-35% degradation No Static networks: 10-30% degradation Buffering on line-cards: 35% improvement

Questions: umar@mit.edu Thank you! Questions: umar@mit.edu