CprE / ComS 583 Reconfigurable Computing

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Spring 2006CS 685 Network Algorithmics1 Principles in Practice CS 685 Network Algorithmics Spring 2006.
OpenFlow overview Joint Techs Baton Rouge. Classic Ethernet Originally a true broadcast medium Each end-system network interface card (NIC) received every.
© 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—4-1 Implementing Inter-VLAN Routing Deploying Multilayer Switching with Cisco Express Forwarding.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
Computer Networks Switching Professor Hui Zhang
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Jon Turner (and a cast of thousands) Washington University Design of a High Performance Active Router Active Nets PI Meeting - 12/01.
The Layered Protocol Wrappers 1 Florian Braun, Henry Fu The Layered Protocol Wrappers: A Solution to Streamline Networking Functions to Process ATM Cells,
Applied research laboratory David E. Taylor Users Guide: Fast IP Lookup (FIPL) in the FPX Gigabit Kits Workshop 1/2002.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Reconfigurable.
Gigabit Kits Workshop August Washington WASHINGTON UNIVERSITY IN ST LOUIS IP Processing Wrapper Tutorial Gigabitkits Workshop August 2001
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Delivery, Forwarding, and Routing of IP Packets
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Extensible Networking Platform Lockwood / Zuver - Applied Research Laboratory -- Extensible Networking Development of a System-On-Chip Extensible.
FPX Network Platform 1 John Lockwood, Assistant Professor Washington University Department of Computer Science Applied Research.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications.
Hot Interconnects TCP-Splitter: A Reconfigurable Hardware Based TCP/IP Flow Monitor David V. Schuehler
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Washington WASHINGTON UNIVERSITY IN ST LOUIS 1 DTI Visit - John DeHart- 4/25/2001 Agenda l WU/ARL Background – John DeHart (15 minutes) l DTI Background.
Field Programmable Port Extender (FPX) 1 NCHARGE: Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied.
Lecture Note on Switch Architectures. Function of Switch.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
Packet Switch Architectures The following are (sometimes modified and rearranged slides) from an ACM Sigcomm 99 Tutorial by Nick McKeown and Balaji Prabhakar,
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 11 : Priority and Per-Flow Queuing in Machine Problem 3 (Revision 2) Washington.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the Field Programmable Port Extender John Lockwood and David Taylor Washington University.
CS/CoE 536 : Lockwood 1 CS/CoE 536 Reconfigurable System On Chip Design Lecture 10 : MP3 Working Draft Washington University Fall 2002
Field Programmable Port Extender (FPX) 1 Remote Management of the Field Programmable Port Extender (FPX) Todd Sproull Washington University, Applied Research.
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
The FPX KCPSM Module 1 Henry Fu The FPX KCPSM Module: An Embedded, Reconfigurable Active Processing Module for the FPX Henry Fu Washington University.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
Washington University
Introduction to Networks v6.0
Modular Design Techniques for the FPX
Chapter 4 Network Layer All material copyright
Instructor Materials Chapter 5: Ethernet
IP Routers – internal view
Buffer Management and Arbiter in a Switch
CS 268: Router Design Ion Stoica February 27, 2003.
Packet Forwarding.
Addressing: Router Design
Chapter 3 Part 3 Switching and Bridging
What’s “Inside” a Router?
Data Link Issues Relates to Lab 2.
Advance Computer Networking
Network Core and QoS.
Field-programmable Port Extender (FPX) January 2001 Workshop
Dynamic Packet-filtering in High-speed Networks Using NetFPGAs
Washington University, Applied Research Lab
Remote Management of the Field Programmable Port Extender (FPX)
Layered Protocol Wrappers Design and Interface review
Implementing an OpenFlow Switch on the NetFPGA platform
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Chapter 3 Part 3 Switching and Bridging
Network Layer: Control/data plane, addressing, routers
CCE1030 Computer Networking
Project proposal: Questions to answer
Network Core and QoS.
Chapter 4: outline 4.1 Overview of Network layer data plane
Presentation transcript:

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #8 – Applications II

Recap – Splash 1 Architecture VME Bus VSB Bus Interface Interface FIFO IN Control FIFO OUT F3 F2 F1 F0 F31 F30 F29 F28 M3 M2 M1 M0 M31 M30 M29 M28 M4 M5 M6 M7 M24 M25 M26 M27 F4 F5 F6 F7 F24 F25 F26 F27 F11 F10 F9 F8 F23 F22 F21 F20 M11 M10 M9 M8 M23 M22 M21 M20 M12 M13 M14 M15 M16 M17 M18 M19 F12 F13 F14 F15 F16 F17 F18 F19 September 14, 2006 CprE 583 – Reconfigurable Computing

Recap – Splash 2 Architecture From Prev Board M0 M1 M2 M3 M4 M5 M6 M7 M8 F1 F2 F3 F4 F5 F6 F7 F8 Crossbar Switch F0 To Next Board F16 F15 F14 F13 F12 F11 F10 F9 The crossbar switch takes 9 nibbles in, each of which can be routed to any destination. It can also change between 8 configurations on the fly. Its full reprogrammability power was never exploited, as the designers never used more than 4 modes, and often only 1. M16 M15 M14 M13 M12 M11 M10 M9 CprE 583 – Reconfigurable Computing September 14, 2006

Recap – Splash 2 Architecture External Input Array Boards R-Bus S-Bus Interface Board The External I/O ports were probably inspired by the PAM, which allowed direct connections to external devices. The (pink) SBus is used either as an extension of the SBus line from the host to load programs, or as a return path for data from the Array boards. The RBus is used when in Systolic Mode. the SIMD Bus is used in SIMD Mode. The lines on the right are used to chain the boards together in Systolic Mode using the RBus. Sparc-based Host SIMD S-Bus External Output CprE 583 – Reconfigurable Computing September 14, 2006

Recap – Dictionary Search Shift amount: 7 bits Hash function: 1100 1000 1010 0011 00 0000 0000 0000 0000 0000 Clear hash register 01 1010 0001 1101 00 Input the letters “th” --------------------------------- 10 1000 0011 0101 1100 0000 Temporary Result 10 0000 0101 0000 0110 1011 Result for “th” 00 0000 0001 1001 01 Input for letters “e_” ----------------------------------------- 01 0010 0110 0001 1110 1011 Temporary result 10 0101 1010 0100 1100 0011 Result for “the_” XOR two character value with temp result and hash function Rotate result Different hash function for each FPGA CprE 583 – Reconfigurable Computing September 14, 2006

CprE 583 – Reconfigurable Computing Outline Recap The Field-Programmable Port Extender (FPX) FPX Architecture FPX Programming Model FPX Applications Pattern Matching Packet Classification Rule Processing September 14, 2006 CprE 583 – Reconfigurable Computing

Application – Network Processing Networking applications well-suited for reconfigurable hardware Target signatures change often Massive quantities of stream-based data Repetitive operations Connecting up to a realistic networking environment is hard Washington University experimental setup one of the best Shows importance of both memory and processing capability Numerous experiments performed over the past five years September 14, 2006 CprE 583 – Reconfigurable Computing

Network Routing with the FPX IP Packets IP Packets FPX Modules distributed across each port of a switch IP packets (over ATM) enter and depart line card Packet fragments processed by modules Advantages: New protocols implemented directly in silicon Easy to upgrade in the field The FPX processes packets at the edge of a switch, in between the fiber-optic line card and the backplane switch fabric. The FPX sees both flows in the ingress (input to the swich) and egress (exit from the switch). The throughput of the packet processing system must be the same or greater than the maximum link speed of the fiber optic card. The current FPX handles link rates up to OC48. September 14, 2006 CprE 583 – Reconfigurable Computing

CprE 583 – Reconfigurable Computing FPX Hardware Device The FPX is implemented on a 12-layer circuit board September 14, 2006 CprE 583 – Reconfigurable Computing

FPX Hardware in a WUGS-20 Switch Washington University Gigabit Switch (WUGS) Up to 160 Gbps of bandwidth September 14, 2006 CprE 583 – Reconfigurable Computing

CprE 583 – Reconfigurable Computing FPGA-based Router FPX module contains two FPGAs NID – network interface device Performs data queuing RAD – reprogrammable application device Specialized control sequences September 14, 2006 CprE 583 – Reconfigurable Computing

Reprogrammable Application Device Spatial Re-use of FPGA Resources Modules implemented using FPGA logic Module logic can be individually reprogrammed Shared Access to off-chip resources Memory Interfaces to SRAM and SDRAM Common Datapath to send and receive data September 14, 2006 CprE 583 – Reconfigurable Computing

Architecture of the FPX RAD Large Xilinx FPGA Attaches to SRAM and SDRAM Reprogrammable over network Provides two user-defined Module Interfaces NID Provides Utopia Interfaces between switch & line card Forwards cells to RAD Programs RAD The FPX is implemented with two FPGAs: the RAD and the NID. The Reprogrammable Application Device (RAD) contains the user-defined modules. External SRAM and SDRAM connect to the RAD. The RAD contains the multiple modules of application-specific functionality The NID contains the logic to: Interface with switch and line card Route traffic flows to the RAD Reprogram the modules on the RAD September 14, 2006 CprE 583 – Reconfigurable Computing

Architecture of the FPX (cont.) September 14, 2006 CprE 583 – Reconfigurable Computing

CprE 583 – Reconfigurable Computing FPX SRAM Provide low latency for fast table-lookups Zero Bus Turnaround (ZBT) allows back-to-back read / write operations every 10ns Dual, Independent Memories 36-bit wide bus The SRAM memories are pipelined, so that they can be fully utilized. Use Zero Byte Turnaround (ZBT) memory, a memory access has four cycles of latency. The SRAM is well suited for fast memory lookups. CprE 583 – Reconfigurable Computing September 14, 2006

CprE 583 – Reconfigurable Computing FPX SDRAM Dual, independent SDRAM memories 64-bit wide, 100 MHz 64Mb / Module : 128 Mb total [expandable] Burst-based transactions [1-8 word transfers] Latency of 14 cycles to Read/Write 8-word burst SDRAM provides cost-effective storage of data. A 64-bit wide, pipelined module allows full-throughput queuing of traffic to and from the RAD. CprE 583 – Reconfigurable Computing September 14, 2006

CprE 583 – Reconfigurable Computing Routing Traffic Flows Traffic flows routed among Switch Line Card RAD.Switch RAD.Linecard VC NID Functions Check packets for errors Process commands Control, status, & reprogramming Implement per-flow forwarding Switch LineCard ccp The NID provides a non-blocking, multi-port switch for forwarding data to the appropriate module. As flows enter the system EC CprE 583 – Reconfigurable Computing September 14, 2006

Typical Flow Configurations VC EC ccp RAD Switch LineCard Default Flow Action (Bypass) VC EC ccp RAD Switch LineCard (Per-flow Output Queueing) Egress Processing VC EC ccp RAD Switch LineCard Ingress Processing (IP Routing) VC EC ccp RAD Switch LineCard Full RAD Processing (Packet Routing and Reassembly) VC EC ccp LineCard Switch RAD Full Loopback Testing (System Test) VC EC ccp RAD Switch LineCard Partial Loopback Testing (Egress Flow Processing Test) Several configurations of flow routing are possible. By default, the FPX forwards all flows directly between the ingress and egress ports To process packets just on the egress path, flows can be routed from the switch to a RAD module; then routed from the RAD module to the line card. To process packets on the ingress path, flows are routed from the line card to a RAD module; then routed from the RAD module to the switch. Using both modules, packets can be processed on the ingress and egress paths Modules can also be chained, allowing packets to be processed by multiple modules as they pass through the FPX September 14, 2006 CprE 583 – Reconfigurable Computing

CprE 583 – Reconfigurable Computing Reprogramming Logic NID programs at boot from EPROM Switch Controller writes RAD configuration memory to NID Configuration file for RAD arrives transmitted over network via control cells Switch Controller issues {Full/Partial} reconfigure command NID reads RAD config memory to program RAD Performs complete or partial reprogramming of RAD Switch Element IPP OPP LC CprE 583 – Reconfigurable Computing September 14, 2006

FPX Interfaces Provides Well defined Interface Utopia-like 32-bit fast data interface Flow control allows back-pressure Flow Routing Arbitrary permutations of packet flows through ports Dynamically Reprogrammable Other modules continue to operate even while new module is being reprogrammed Memory Access Shared access to SRAM and SDRAM Request/Grant protocol September 14, 2006 CprE 583 – Reconfigurable Computing

Pattern Matching using the FPX Use Hardware to detect a pattern in data Modify packet based on match Pipeline operation to maximize throughput September 14, 2006 CprE 583 – Reconfigurable Computing

“Hello, World” Module Function September 14, 2006 CprE 583 – Reconfigurable Computing

Logical Implementation Append “WORLD” to payload VCI Match New Cell CprE 583 – Reconfigurable Computing September 14, 2006

CprE 583 – Reconfigurable Computing The Wrapper Concept App Wrapper Wrapper September 14, 2006 CprE 583 – Reconfigurable Computing

CprE 583 – Reconfigurable Computing AAL5 Encapsulation Payload is packed in cells Padding may be added 64 bit Trailer at end of cell Trailer contains CRC-32 Last Cell indication bit (last bit of PTI field) September 14, 2006 CprE 583 – Reconfigurable Computing

CprE 583 – Reconfigurable Computing HelloBob Module SRAM Interface Input UDP Hello Bob Output Echo UDP Processor IP Processor Frame Processor Cell Processor September 14, 2006 CprE 583 – Reconfigurable Computing

CprE 583 – Reconfigurable Computing Results: Performance Operating Frequency: 119 MHz. 8.4ns critical path Well within the 10ns period RAD's clock. Targeted to RAD’s V1000E-FG680-7 Maximum packet processing rate: 7.1 Million packets per second. (100 MHz)/(14 Clocks/Cell) Circuit handles back-to-back packets Slice utilization: 0.4% (49/12,288 slices) Less than one half of one percent of chip resources Search technique can be adapted for other types of data matching and modification Regular expressions Parsing image content … September 14, 2006 CprE 583 – Reconfigurable Computing

CAM-based Packet Matching Sample Packet: Source Address = 128.252.5.5 (dotted.decimal) Destination Address = 141.142.2.2 (dotted.decimal) Source Port = 4096 (decimal) Destination Port = 50 (decimal) Protocol = TCP (6) Payload = “Consolidate your loans. CALL NOW” Payload Lists = { General SPAM (0), Save Money SPAM (1) } Content Vector = “00000011” (binary) = x”03” (hex) 111 104 103 72 71 40 39 8 7 Con- tent = 03 Src IP (hex) = 80FC0505 Dest IP (hex) = 8D8E0202 Src Port = 1000 Dest Port = 0050 Proto = 06 September 14, 2006 CprE 583 – Reconfigurable Computing

Sample Filter DROP the packet : It matches the filter Source Address = 128.252.0.0 / 16 Destination Address = 141.142.0.0 / 16 Source Port = Don’t Care Destination Port = 50 Protocol = TCP (6) Payload includes general SPAM (List 0) Con- ten t= 01 Src IP value = 80FC0000 Dest IP (hex) = 8D8E0000 Src Port = 0000 Dest Port = 50 Proto = 06 Value Con- ten t= 01 Src IP (hex) = FFFF0000 Dest IP (hex) = FFFF0000 Src Port = 0000 Dest Port = FFFF Proto = FF Mask: 1=care 0=don’t care 103 72 71 40 39 8 7 Con- tent= 03 Src IP (hex) = 80FC0505 Dest IP (hex) = 8D8E0202 Src Port = 1000 Dest Port = 0050 Proto = 06 IP Packet DROP the packet : It matches the filter September 14, 2006 CprE 583 – Reconfigurable Computing

Packet Classifier with FlowID 16 bits 112 bits Flow ID [1] CAM MASK [1] CAM VALUE [1] Flow ID [2] CAM MASK [2] CAM VALUE [2] 16 bits - - CAM Table - - Flow ID Flow ID [3] CAM MASK [3] CAM VALUE [3] . . . Resulting Flow Identifier . . . . . . Flow ID [N] CAM MASK [N] CAM VALUE [N] Bits in IP Header Flow List Priority Encoder Payload Match Bits Source Port Protocol Mask Matchers Dest. Port Value Comparators Source Address Destination Address CprE 583 – Reconfigurable Computing September 14, 2006

Fast IP Lookup Algorithm Function Search for best matching prefix using Trie algorithm Prefix Next Hop * 01* 4 7 10* 2 110* 9 0001* 1 1011* 00110* 5 01011* 3 1 A trie is a tree-based data structure for storing strings in order to support fast pattern matching The name “trie” comes from the word “retrieval”, since the main application of tries is in information retrieval September 14, 2006 CprE 583 – Reconfigurable Computing

Hardware Implementation in the FPX SRAM1 1 SRAM1 Interface Remap VCIs for IP packets Extract IP Headers Request Grant IP Lookup Engine counter On-Chip Cell Store SRAM2 Packet Reassembler Control Cell Processor RAD FPGA NID FPGA LC SW September 14, 2006 CprE 583 – Reconfigurable Computing

Pipelined FIPL Operations Generate Address Generate Address Latch ADDR into SRAM SRAM D < M[A] Latch Data into FPGA Compute Time (cycles) Time (cycles) Space (Parallel lookup units on FPGA) Throughput : Optimized by interleaving memory accesses Operate 5 parallel lookups t_pipelined_lookup = 550ns / 5 = 110 ns Throughput = 9.1 Million packets / second September 14, 2006 CprE 583 – Reconfigurable Computing

Other Modules Implemented IPv4 CAM Filter 104 Bit header matching Fast IP Lookup (FIPL) Longest Prefix Match MAE-West at 10M pkts/second Packet Content Scanner Reg. Expression Search Data Queueing Per-flow queue in SDRAM IPv6 Tunneling Module Tunnels IPv6 over IPv4 Statistics Module Event counter Traffic Generator Per-flow mixing Video Recoder Motion JPEG Embedded Processor KCPSM CprE 583 – Reconfigurable Computing September 14, 2006

CprE 583 – Reconfigurable Computing Summary Field Programmable Port Extender (FPX) Network-accessible Hardware Reprogrammable Application Device Module Deployment Modules implement fast processing on data flow Network allows Arbitrary Topologies of distributed systems Project Website http://www.arl.wustl.edu/arl/projects/fpx/ September 14, 2006 CprE 583 – Reconfigurable Computing