Paper Review Building a Robust Software-based Router Using Network Processors.

Slides:

Advertisements

Similar presentations

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.

Advertisements

NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius

Supercharging PlanetLab A High Performance,Multi-Alpplication,Overlay Network Platform Reviewed by YoungSoo Lee CSL.

CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.

Router Architecture : Building high-performance routers Ian Pratt

Spring 2002CS 4611 Router Construction Outline Switched Fabrics IP Routers Tag Switching.

4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.

1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.

1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.

10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.

IXP1200 Microengines Apparao Kodavanti Srinivasa Guntupalli.

t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.

Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.

CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.

©UCB CS 162 Computer Architecture Lecture 2: Introduction & Pipelining Instructor: L.N. Bhuyan

Shyamal Pandya Implementation of Network Processor Packet Filtering and Parameterization for Higher Performance Network Processors 1 Implementation of.

ECE 526 – Network Processing Systems Design

Research Gísli Hjálmtýsson - AT&T Research - 1 Programmable Networks of Tomorrow (Pronto): The Programmable Interface of Pronto.

Design of QoS Router Terrance Lee. Broadband Internet Architecture Intelligent Access Electronic Switch (Intserv or Diffserv) Switching /Routing QoS Security.

A 50-Gb/s IP Router Authors: Craig Partridge et al. IEEE/ACM TON June 1998 Presenter: Srinivas R. Avasarala CS Dept., Purdue University.

Chapter 9 Classification And Forwarding. Outline.

Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.

1 IP Forwarding Relates to Lab 3. Covers the principles of end-to-end datagram delivery in IP networks.

Router Architectures An overview of router architectures.

Router Architectures An overview of router architectures.

Chapter 4 Queuing, Datagrams, and Addressing

Computer Networks Switching Professor Hui Zhang

A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.

Gigabit Routing on a Software-exposed Tiled-Microprocessor

Internet Protocol (IP)

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Univ. of TehranComputer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr. Nasser Yazdani.

1 IP Forwarding Relates to Lab 3. Covers the principles of end-to-end datagram delivery in IP networks.

A 50-Gb/s IP Router 참고논문 : Craig Partridge et al. [ IEEE/ACM ToN, June 1998 ]

CSE 58x: Networking Practicum Instructor: Wu-chang Feng TA: Francis Chang.

Router Architecture Overview

Interrupts, Buses Chapter 6.2.5, Introduction to Interrupts Interrupts are a mechanism by which other modules (e.g. I/O) may interrupt normal.

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”

Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

OpenFlow MPLS and the Open Source Label Switched Router Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,

Lecture Note on Switch Architectures. Function of Switch.

Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.

1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown

Network Layer4-1 Chapter 4 Network Layer All material copyright J.F Kurose and K.W. Ross, All Rights Reserved Computer Networking: A Top Down.

1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.

InterVLAN Routing 1. InterVLAN Routing 2. Multilayer Switching.

P4: Programming Protocol-Independent Packet Processors

Memory COMPUTER ARCHITECTURE

CS 268: Router Design Ion Stoica February 27, 2003.

Addressing: Router Design

What’s “Inside” a Router?

Internet Protocol (IP)

CS 31006: Computer Networks – The Routers

An NP-Based Router for the Open Network Lab Overview by JST

Network Core and QoS.

Apparao Kodavanti Srinivasa Guntupalli

Implementing an OpenFlow Switch on the NetFPGA platform

Router Construction Outline Switched Fabrics IP Routers

EE 122: Lecture 7 Ion Stoica September 18, 2001.

Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.

Project proposal: Questions to answer

Networking and Network Protocols (Part2)

IP Forwarding Relates to Lab 3.

Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu

Network Core and QoS.

Presentation transcript:

Paper Review Building a Robust Software-based Router Using Network Processors

ABSTRACT Need More Service  Software-based Routers Router: IXP1200 Network Processor development board PC 3.47 Mpps (minimum size packets) or 1.77 G of aggregate Hierarchical Architecture: Guarantees line speed for forwarding of simple packets Extra capacity for exceptional packets in P3(310 Kpps and 1510 cycles for each)

INTRODUCTION Most Network Processors use parallelism. IXP1200: 6 Micro Engines each supporting up to 4 hardware contexts. Router with a data plane (MEs) and a control plane (P3). Processor Hierarchy: OSPF, Updating Routing tables, …[More cycles] Missed packets from cache Minimum packet processing, forwarding,…[Fewer cycles]

ARCHITECTURE-Software Classifier Forwarder Scheduler Input queue Two default forwarder: Minimal IP forwarding fast path. Full IP protocol (IP options) Two main attributes: Explicit support for adding new forwarders in run time Does not specify where in Processor hierarchy

ARCHITECTURE-Hardware IXP Evaluation System (200MHz): 32MB DRAM (64-bit 100MHz) 2MB SRAM (32-bit 100MHz) 4KB On-chip scratch 64-bit 66MHz IX bus Ethernet ports(8*100M + 2*1G) 32-bit 100MHz PCI Bus 4KB ISTORE for each ME 4KB I-cache for StrongARM A pair of FIFOs: (16 slot*64 byte) rate of DRAM = 6.4Bbps Send/receive BW = 2*(8*100M+2*1G) = 5.6 Gbps Capacity of IX Bus = 4 Gbps

Forwarding Pipeline The common unit = 64-byte MAC-packet(MP) MAC breaks and tag as first, intermediate, last or only MP in packet Allocating slots to MACs and drains input FIFO and fill output FIFO Can MEs from input FIFO to output FIFO in a single step? 2 stage pipeline:

Input Processing INPUT_LOOP: 1 acquire_input_mutex() 2 if (!port_rdy(p)) goto INPUT_LOOP 3 load IN_FIFO[c] 4 release_input_mutex() 5 mp_addr = calculate mp_addr() 6 copy reg_mp_data IN_FIFO[c] 7 state = protocol_processing(reg_mp_data) 8 copy reg_mp_data  DRAM[ mp_addr] 9 if (at_start_of_packet(state)) 10 enqueue(state, state.queue) 11 goto INPUT_LOOP Validating header Updating TTL Re-computing checksum Set source and dest MACs Destination Queue For IP: Strict FIFO slots and context binding Minimum Forwarder: one-cycle hardware hash

Scheduling & Buffering A Queue that is serviced by StrongARM Statically allocates a set of contexts to run input loop 16 input contexts Token passing (hardware signaling mechanism) to serialize DMA access. 16MB of DRAM (8192 buffers of 2KB) consumed in a circular fashion A shared state variable Buffer scheduling:

Output Processing OUTPUT LOOP: 1 acquire_output_mutex() 2 release_output_mutex() 3 if (finished_last_ packet) 4 qid = select_queue() 5 state = dequeue(qid) 6 mp_addr = first_mp(state) 7 else 8 mp_addr =next_mp(state) 9 fifo_addr = calculate_fifo_addr() 10 copy DRAM[mp_addr]  OUT_FIFO[fifo_addr] 11 enable IN_FIFO[fifo_addr] 12 finished_last_packet =at_end_of_packet(state) 13 goto OUTPUT LOOP Select none empty queue form that port queues (Scheduling)

Queuing Queues are assigned statically to output contexts: Output context saves queues in 16 registers not in scratch memory. Multiple queues. Which one next? By prioritizing queues. Queues: Circular arrays of 32-bit entries in SRAM. 1.Use mutexes. 2.Have queues for each inputs in outputs  Single priority level Contention:

Queuing [cont] I.2 + O.1 I.2 + O.3 : Maximum flexibility I.1 + O.3 : Slower rate

Evaluation For one MP: 280 cycles for register operations 180(DRAM) + 90(SRAM) + 160(Scratch) = 430 cycles for memory Sum = 710 cycles = 3550 ns (for 200 MHz) 3.47 Mpps  each packet is processed in 288 ns Result: The system can forward 12 packets in parallel

Switching Paths Path A: Forward packets at maximum rate of 3.47Mpps Path B: Forward packets at 526 Kpps Path C: Forward packets at 534 Kpps(500cpp) StrongARM is involved too. |No additional tasks for MEs. PRIORITY

StrongARM OS on StrongARM: 1.Acts as a bridge that forward packets to P4 2.Supports a small collection of local forwarders Simple priority scheme: Gives packets being passed to P3 over packets that are to be processed locally. Complicated to decide forwarders: It supports Pentium It shares resources with MEs and can act like them

Virtual Router Processor MEs statically have 2 tasks: A router infrastructure (RI) that is able to forward minimum-sized packets A virtual router processor (VRP) that run additional code on behalf of each packet protocol_processing runs on abstract machine.

Interfacing & Implementation StrongARM interacts with MEs: fid = install(key, fwdr, size, where) remove(fid) data = getdata(fid) setdata(fid, data) (src addr, src port, dst addr, dst port) Key: Installs fwrd that matches the key and specified flow size and where indicates the processor ME: Load from StrongARM to ME’s ISTORE SA: Loads into DRAM PE: Loads into Pentium jump table Where:

Interfacing Some date forwarders:

Conclusions How to program the processor hierarchy with a fixed forwarding infrastructure that fully exploits the parallelism available on the IXP1200 MicroEngines. Demonstrates how new functionality can be injected into all three levels of the processor hierarchy. Statically partition the processing capacity of the MicroEngines into a fixed routing infrastructure and a programmable VRP. Can be used in many designs.