Haobo Wang, Ramesh Karri Polytechnic University

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

System Integration and Performance
Delivery and Forwarding of
EECC694 - Shaaban #1 lec # 10 Spring Asynchronous Transfer Mode (ATM) ATM is a specific asynchronous packet-oriented information, multiplexing.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
CS Summer 2003 Lecture 9. CS Summer 2003 FILTERSPEC Object FILTERSPEC Object defines filters for selecting a subset of data packets in a session.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
Connecting Devices and Multi-Homed Machines. Layer 1 (Physical) Devices Repeater: Extends distances by repeating a signal Extends distances by repeating.
Common Devices Used In Computer Networks
McGraw-Hill©The McGraw-Hill Companies, Inc., 2004 Chapter 11 Data Link Control Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction.
The Layered Protocol Wrappers 1 Florian Braun, Henry Fu The Layered Protocol Wrappers: A Solution to Streamline Networking Functions to Process ATM Cells,
CPS 356: Introduction to Computer Networks Lecture 7: Switching technologies Ch 2.8.2, 3.1, 3.4 Xiaowei Yang
1 CSE3213 Computer Network I Network Layer (7.1, 7.3, ) Course page: Slides modified from Alberto Leon-Garcia.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
15.1 Chapter 15 Connecting LANs, Backbone Networks, and Virtual LANs Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Hardware Implementation of a Signaling Protocol Polytechnic University Center for Advanced Technology in Telecommunications Haobo Wang Malathi Veeraraghavan.
Internet Protocols (chapter 18) CSE 3213 Fall 2011.
1 Circuit switch controller: Routing and signaling Malathi Veeraraghavan University of Virginia Circuit switch –Routing –Signaling Difference in use of.
Unit III Bandwidth Utilization: Multiplexing and Spectrum Spreading In practical life the bandwidth available of links is limited. The proper utilization.
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
OpenFlow MPLS and the Open Source Label Switched Router Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Signaling Transport Options in GMPLS Networks: In-band or Out-of-band Malathi Veeraraghavan & Tao Li Charles L. Brown Dept. of Electrical and Computer.
Chapter 3 Part 3 Switching and Bridging
Youngstown State University Cisco Regional Academy
2.10 Flow and Error Control Before that ...
Topics discussed in this section:
Chapter 8 Switching Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
OSPF (Open Shortest Path First)
Packet Switching Outline Store-and-Forward Switches
Wireless ATM & Congestion Control
Packet Switching Datagram Approach Virtual Circuit Approach
Prof.Veeraraghavan Prof.Karri Haobo Wang:
Chapter 6 Delivery & Forwarding of IP Packets
Cache Memory Presentation I
Chapter 3 Part 3 Switching and Bridging
© 2002, Cisco Systems, Inc. All rights reserved.
CS 31006: Computer Networks – The Routers
Cache Memories September 30, 2008
End-host Initiated GMPLS Signaling Demo
Dr. John P. Abraham Professor UTPA
Protocol Basics.
CHAPTER 8 Network Management
Network Core and QoS.
Dr. John P. Abraham Professor UTRGV, EDINBURG, TX
Hardware-Accelerated Signaling
Delivery and Forwarding of
Layered Protocol Wrappers Design and Interface review
Data Communication Networks
Dr. John P. Abraham Professor UTPA
Implementing an OpenFlow Switch on the NetFPGA platform
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Chapter 15. Internet Protocol
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
Chapter 3 Part 3 Switching and Bridging
Chapter 2 Switching.
Packet Switching Outline Store-and-Forward Switches
Congestion Control (from Chapter 05)
CIS679: Two Planes and Int-Serv Model
Chapter-5 Traffic Engineering.
ITIS 6167/8167: Network and Information Security
Switching Chapter 2 Slides Prepared By: -
Review What are the advantages/disadvantages of pages versus segments?
Data Link Layer. Position of the data-link layer.
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
Network Core and QoS.
Presentation transcript:

A Hardware-Accelerated Implementation of the RSVP-TE Signaling Protocol Haobo Wang, Ramesh Karri Polytechnic University Malathi Veeraraghavan, Tao Li University of Virginia

Outline Background and problem statement A subset of RSVP-TE signaling protocol Hardware-accelerated implementation Conclusions and future work I repeat this outline for each section. Light blue shows the current section. 2019/1/16

Outline Background and problem statement A subset of RSVP-TE signaling protocol Hardware-accelerated implementation Conclusions and future work 2019/1/16

Background Signaling protocol RSVP-TE for GMPLS Control-plane protocol Set up and tear down connections in connection-oriented networks RSVP-TE for GMPLS Support a wide range of connection-oriented networks Being implemented by switch vendors Signaling protocols are primarily implemented in software Two reasons: complexity and flexibility Price paid: poor performance I’m wondering whether we should exchange item 2 and item 3. First we talk about general signaling protocol, followed by a specific signaling protocol, RSVP-TE. Then we explain why signaling protocols are primarily implemented in software. 2019/1/16

Problem statement Implement a subset of RSVP-TE signaling protocol in reconfigurable hardware Specifically tailored for SONET switches Achieve high call handling capacity and low delay Challenges RSVP RSVP-TE (MPLS)  RSVP-TE (GMPLS) A large number of messages, objects, parameters TLV (Type-Length-Value) style object processing Maintaining per connection state information Sophisticated data table manipulations Timers And more… Our work is to implement a subset of RSVP-TE signaling protocol in HW. Our target is SONET switch, we expect better performance. What are the difficulties? Here the challenges, and the innovative solutions on the next page, and implementation details on slides 15-19, are related. 2019/1/16

How to solve these challenges? Partition signaling functions Implement time-critical functions in hardware, and relegate non-time-critical functions to software Innovative hardware implementations Message/objects dispatchers Data table management Retransmission management and timers And more… 2019/1/16

Outline Background and problem statement A subset of RSVP-TE signaling protocol Hardware-accelerated implementation Conclusions and future work 2019/1/16

A subset of RSVP-TE — Messages Nine messages Path, Resv, PathTear, and ResvTear Set up and tear down connections Time-critical —> hardware All other messages Non-time-critical —> software Each message consists of a common header and a variable number of objects 2019/1/16

A subset of RSVP-TE — Objects Type-Length-Value (TLV) style object Flexible but a challenge for hardware implementation All mandatory objects and three optional objects are processed by hardware SUGGESTED_LABEL: forward label allocation MSG_ID, MSG_ID_ACK: reliable transmission 2019/1/16

A subset of RSVP-TE — Procedures (setup) Five steps at each switch Determine next-hop Reserve resources Allocate “labels” Program switch fabric Update state information Here we use setup procedure as an example. Releasing a connection follow the same end-to-end procedure. Pay attention that Path and Resv messages are together used for setting up a connection. PathTear, or ResvTear, either one can be used to tear down a connection. About these five steps. Our implementation realizes the first 3 steps in forward direction. However, allocating labels, according to RSVP specification, can be on the backward direction. In the paper, we justified our choice. Forward direction Backward direction 2019/1/16

Outline Background and problem statement A subset of RSVP-TE signaling protocol Hardware-accelerated implementation Conclusions and future work 2019/1/16

Network and node views 2019/1/16

Architecture of the prototype board FIFO: 64K by 36. A message is 125B or 32 32-bit words. Therefore can hold 2000 messages. TCAM: 64K by 72 entry device. 16 segments each of 4K. Each segment can be configured as a 4K by 72, 2K by 144, 1K by 288 or 512 by 576. SRAM: 128K by 36 FIFO holds messages to be passed to software. SRAM0: for data tables along with TCAM SRAM1: messages waiting for ACKs. What is a ternary CAM: Basically in a CAM we match the contents with a data value that we are looking up. But with Ip address some bits should not be matched if subnet mask indicates a 0. So it is 1, 0, x, and therefore we need a TCAM. It is realized with two 64K by 72 tables, one data and one for mask. 2019/1/16

Architecture of the hardware signaling accelerator (FPGA) 2019/1/16

TLV style object Processing SESSION Dispatcher RSVP_HOP Unknown obj. Processor Object Dispatchers Message Dispatcher There is some animations. First show the message dispatcher, then show multiple objects dispatchers. Then point out that the TLV processing consists of these two levels of dispatchers. The function of message dispatcher is to delimit objects. Object dispatchers work independently and in-parallel. Only the matched object dispatcher will be triggered. Two levels of Dispatchers 2019/1/16

Six data tables I use different color bubble to match the tables locations on the next slide. 2019/1/16

Organization of the data tables This figure is not clear. Please mention that these tables are located in three devices, TCAM, SRAM, and internal SRAM inside FPGA. Also show the audience that routing table (#1) locates in both TCAM and SRAM, Incoming Connectivity table and Outgoing Connectivity table locate in TCAM only (the return value from the TCAM (address) is what we need. Outgoing CAC table is located inside FPGA completely (two reasons, first, this table will be accessed several times in each message processing, we need to speed up the accessing; second, this table is small enough to fit into FPGA internal memory). State table is indexed by 128-bit global ID, and return 228-bit state information (it is expandable, we can maintain more state information). The outgoing and incoming conn. tables do need the SRAM because we need to extract 6 (if id) + 6 (phyid) + 12 (timeslots). Haobo took some impl. shortcut. 2019/1/16

Retransmission management (buffers and timers) It is a simplified version. RSVP-TE is built upon unreliable IP, RFC2901 proposed a exponential back-off retransmission algorithm, and introduced MESSAGE_ID, MESSAGE_ID_ACK objects. On the transmitting side, the unacknowledged messages are buffered. The buffer is organized as four segments, corresponding to initial transmission, first RE-transmission, and second retransmission (we try at most 3 times). Each buffered message also has an associated time tag. Messages are buffered in the FIFO (First In First Out), the head is always the oldest one, so we only need to compare the head message’s time tag with the timer. There are 3 separate FIFOs for initial transmission, 1st retransmission, and 2nd retransmission. Totally we have 3 timers on the transmitting side. On the receiving side. Each time a MESSAGE_ID is received, we do not send but the corresponding MESSAGE_ID_ACK object in a separate ACK message immediately. Instead, we save the corresponding MESSAGE_ID_ACK object into buffers. The buffers are organized according to the destination address (different neighbors). We have k (# of neighbors) timers. When timing out, we have no choice but to send out an ACK message and include all MSG_ID_ACKs. Or if we can, we piggyback the MSG_ID_ACKs in a ordinary signaling message. 2019/1/16

Processing of Path message Here is some animation. My idea is to show the complexity of message processing. Let’s use Path message as an example, it is three levels of processing. First we process the common header and find out it is a Path message. Next we processing the Path message and figure out that we need to process SESSION object. Then we process SESSION object. In next slide, I’ll show the simulation result of the Path message. 2019/1/16

Processing of Path message — timing simulation It is the timing simulation of the processing of Path message. I roughly marked main operations. 2019/1/16

Implementation and simulation results Implementation results Device PCI core Resource Eq.Gates Max freq. XC2V3000 w/o PCI 12% 360,000 90MHz w/ PCI 21% 630,000 50MHz Simulation results (@50MHz) About the implementation result, if including PCI core we got from Xilinx, the maximum frequency is around 50MHz (according to Xilinx report). It is 90MHz without PCI core. About the simulation results, the processing times for different messages are different. But in order to fully pipeline the processing of messages, idle cycles are inserted if necessary. The three stages: receiving, processing, transmitting. Path Resv PathTear/ResvTear Clock cycles 40 32 19 2019/1/16

Outline Background and problem statement A subset of RSVP-TE signaling protocol Hardware-accelerated implementation Conclusions and future work 2019/1/16

Conclusions and applications Feasibility Yes, hardware-accelerated implementation of RSVP-TE is possible Performance 400,000 calls/sec, 7.2 s (@50MHz) 100x-1000x speedup vis-à-vis software implementations CHEETAH: Circuit-switched High-Speed End-to-End Transport Architecture (Opticomm 2003) File transfer application Set up a high-speed circuit (GbE) end-to-end carried on SONET over the wide-area Transmit a single file (5MB file on a Gb/s circuit needs only 40ms) Release the circuit Intense sharing of circuits Hence the high call handling volume Need for very short call processing times consider metro settings with a low prop. delay primary component of call setup delay: call processing delay the smaller the call setup delays, the smaller the files that can be handled with circuits greater the load, better the utilization Call handling rate is 400,000 calls/sec, per switch delay (including message receiving, processing, and transmitting) is 7.2 ns. 400K is not the reciprocal of 7.2 ms, that is because three stages are fully pipelined, so that every 2.4 ns the hardware accelerator can accept a new message. 2019/1/16

Thank you!  Please visit our website for more information http://eeweb1.poly.edu/networks/html-files/index.htm 2019/1/16