Programming Model for Network Processing on FPGAs Eric Keller October 8, 2004 M.S. Thesis Defense.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture
Advertisements

Architecture Representation
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
An Overview of Software-Defined Network Presenter: Xitao Wen.
Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Extensible Networking Platform IWAN 2005 Extensible Network Configuration and Communication Framework Todd Sproull and John Lockwood
1 William Stallings Data and Computer Communications 7 th Edition Chapter 2 Protocols and Architecture.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Configurable System-on-Chip: Xilinx EDK
Chapter 13 Embedded Systems
Data Communications Architecture Models. What is a Protocol? For two entities to communicate successfully, they must “speak the same language”. What is.
The Architecture of Transaction Processing Systems
COE 342: Data & Computer Communications (T042) Dr. Marwan Abu-Amara Chapter 2: Protocols and Architecture.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Network Management Concepts and Practice Author: J. Richard Burke Presentation by Shu-Ping Lin.
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
Data Communications and Networks
Jennifer Rexford Princeton University MW 11:00am-12:20pm Programmable Data Planes COS 597E: Software Defined Networking.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Lecture 2 TCP/IP Protocol Suite Reference: TCP/IP Protocol Suite, 4 th Edition (chapter 2) 1.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Connecting to the Network Networking for Home and Small Businesses.
Software Framework for Teleoperated Vehicles Team Eye-Create ECE 4007 L01 Karishma Jiva Ali Benquassmi Safayet Ahmed Armaghan Mahmud Khin Lay Nwe.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
High Performance Computing & Communication Research Laboratory 12/11/1997 [1] Hyok Kim Performance Analysis of TCP/IP Data.
Internet Addresses. Universal Identifiers Universal Communication Service - Communication system which allows any host to communicate with any other host.
Networks – Network Architecture Network architecture is specification of design principles (including data formats and procedures) for creating a network.
DEVS Namespace for Interoperable DEVS/SOA
Automated Design of Custom Architecture Tulika Mitra
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Programming a Hyper-Programmable Architectures for Networked Systems Eric Keller and Gordon Brebner Xilinx Research Labs, USA.
® Java Debug Hardware Modules Using JBits by Jonathan Ballagh Eric Keller Peter Athanas Reconfigurable Architectures Workshop 2001.
EE3A1 Computer Hardware and Digital Design
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Introduction to VHDL Simulation … Synthesis …. The digital design process… Initial specification Block diagram Final product Circuit equations Logic design.
Tools - Design Manager - Chapter 6 slide 1 Version 1.5 FPGA Tools Training Class Design Manager.
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
Networking Material taken mainly from HowStuffWorks.com.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Protocols and Architecture Slide 1 Use of Standard Protocols.
نظام المحاضرات الالكترونينظام المحاضرات الالكتروني.
Teaching Digital Logic courses with Altera Technology
Network Models. The OSI Model Open Systems Interconnection (OSI). Developed by the International Organization for Standardization (ISO). Model for understanding.
Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
Introduction to Programmable Logic
FPGAs in AWS and First Use Cases, Kees Vissers
Implementing an OpenFlow Switch on the NetFPGA platform
Chapter 15 – Part 2 Networks The Internal Operating System
P4FPGA : A Rapid Prototyping Framework for P4
THE ECE 554 XILINX DESIGN PROCESS
Computer Networking A Top-Down Approach Featuring the Internet
Ch 17 - Binding Protocol Addresses
THE ECE 554 XILINX DESIGN PROCESS
Presentation transcript:

Programming Model for Network Processing on FPGAs Eric Keller October 8, 2004 M.S. Thesis Defense

2 Abstract Programming model for implementing network processing applications on an FPGA Present an API to higher level tools – Programming Language: Presents an abstraction in terms of resources more suitable to the networking domain – Compiler: Generate hardware from this description Demonstrate through four applications – Aurora to GigE Bridge, RPC, IP Router, NAT

3 Outline of Talk Background Design Flow User Interface Compilation to Hardware High Level Tools Experiments/Results Conclusions

4 Outline of Talk Background Design Flow User Interface Compilation to Hardware High Level Tools Experiments/Results Conclusions

5 Tools for FPGAs Hardware Description Languages – Verilog, VHDL Structural High-Level Languages – JHDL, JBits Behavioral High-Level Languages – Handel-C, Forge Domain Specific Languages – Cliff, Snort, Ponder

6 Cliff Maps Click to Xilinx FPGAs Click is a domain specific language for Networking – Modular router on Linux – Elements of common operations e.g. Decrement TTL Elements written in Verilog Script to put system together Lookup Queue Simple op Input Output

7 Networking on FPGAs Routing and Switching – MIR, IP Lookup, Crossbar Switch Protocol Boosters – Error coding, encryption, compression Security – Virus Scanning, Firewall Web Server – TCP/IP in Hardware – x speedup over Sun/Intel based workstations

8 Outline of Talk Background Design Flow User Interface Compilation to Hardware High Level Tools Experiments/Results Conclusions

9 Motivation Goal: Create a design environment that allows networking experts to use FPGAs Several point solutions have shown FPGAs to be a good solution Domain specific languages – There is not a standard high-level tool Use MIR as a starting framework – Collaborating threads processing a message – Flexible architecture for memory and communication

10 Design API Present an API to higher level tools – No leading high-level design entry for networking domain Presents an abstraction in terms of resources suitable to the networking domain – e.g. threads Allow specification of architecture as well as functionality Generate hardware from this description – Generate VHDL – rely on existing back-end tools for mapping to FPGA Present an intermediate textual format – XML

11 Design Hierarchy High Level Tools Programming Interface Platform FPGAs TejaClickNovalit... soft architecture - mapping Back-end tools

12 Design Flow Main Focus: XML to VHDL to bit XML Description (programming language) API (Compiler) Hardware description Back-end tools Configuration Bitstream

13 Outline of Talk Background Design Flow User Interface Compilation to Hardware High Level Tools Experiments/Results Conclusions

14 Abstraction Primitives Interface to External System Intellectual Property Memory Thread communicationsynchronization

15 Threads Micro-engines with instruction level parallelism – Instruction set and conditionals used to program – User defined variables Implemented as custom hardware – Not a microprocessor with fetch, decode, execute Synchronization – Activate, Deactivate Communication – lightweight, channels

16 Intellectual Property Allow for users to make use of pre-designed intellectual property (also called cores) Not all algorithms are best expressed as a finite state machine – e.g. encryption, compression User must: – define the interface – instantiate using an “include” type statement – associate with a thread

17 Interfaces Perimeter of the defined system – System can be whole FPGA or part of larger design Exists as pre-defined netlist – Gigabit Ethernet, Aurora Interface includes: – Grouping of signals into ports – Extra functionality e.g. perform framing and error detection – Protocol to get the message Threads interact with the interface Instantiate involves an “include” type statement

18 Memory Provide buffering of messages, tables for lookup, storage of state Parameterizable – Selection of different memories exists as pre-defined netlist (…for now) each possibly being parameterizable Instantiate through “include” type statement Associate a memory port with a thread

19 Memory (cont’d) FIFO PutGet – Queue of objects, commit mechanism SharedMemory – Single memory shared by multiple accessors – locking mechanism via BRAMs “READ_FIRST” DPMem – Multiple memories shared by multiple accessors – Allocation mechanism

20 Outline of Talk Background Design Flow User Interface Compilation to Hardware High Level Tools Experiments/Results Conclusions

21 Hardware Generation Process of mapping between system resources to the hardware Generate VHDL – One module per thread – Top level module hooking all components together – Memories, interfaces, channels exist as predefined netlists Rely on back-end tools to create bitstream

22 Top Level entity SYSTEM is port ( -- interface ) end SYSTEM; architecture struct of SYSTEM is -- signals begin -- synchronization logic -- instantiate each component -- (interfaces, memories, threads, externally defined IP, channels) end struct;

23 Clocks Interfaces determine clock domains I/F X A B C D Port APort B memory F G HE I/F Y Clock Domain 1Clock Domain 2

24 Thread entity THREAD is port ( -- interface ) end THREAD; architecture behavioral of THREAD is -- signals begin -- control logic -- combinatorial process -- synchronous process -- special circuitry for memory reads and channel gets end behavioral;

25 Special Case Circuitry Memory – READ(var, address) – User wants to work with var, not the memory signals – Need extra circuitry to enable this Channels – CHAN_GET(var, address) Extra conditional testing to see when address matches – START(thread, offset) Extra circuitry to align the data e.g. Ethernet header is 14 bytes

26 Outline of Talk Background Design Flow User Interface Compilation to Hardware High Level Tools Experiments/Results Conclusions

27 Click Click is a language for creating modular software routers – CLIFF is a tool that will map to FPGAs – Using XML instead Create a base system – each element is a thread – each thread connects to one port of a DPMem – each thread can have state storage through SharedMemory memory element Series of optimizations – some pre-base system, some post-base system

28 Click (cont’d) Click graph Sub-graph match and replace.clk Move elements.clk Split Paths.clk Create base System Run Elements in parallel Merge Elements Lib. Of elements (XML) system.xml

29 Teja Teja is a development environment for NPUs SW Lib - define constructs – Events, Data Structures, Components (state machine) SW Arch - instantiate constructs HW Arch - define the hardware resources – import for fixed defined (like NPUs) – create new one for FPGA target HW Mapping – map constructs from SW arch to resources in HW Arch

30 Teja (cont’d) State Machine GUI (C code) Software Arch. GUI compile Data Struct. Library (XML) Thread Library (XML) Software Arch file (internal format) (next slide)

31 Teja (cont’d) Hardware Arch.GUI Hardware Mapping GUI Map (prev slide) Thread, DPMem, Aurora, etc. Hardware Arch file (internal format) Insert lib code System.xml

32 Outline of Talk Background Design Flow User Interface Compilation to Hardware High Level Tools Experiments/Results Conclusions

33 Gigabit Ethernet to Aurora Bridge Two flows that will convert a frame from one protocol to the other Ethernet – broadcast protocol (needs addressing) – Coarse grain flow control Aurora – Xilinx proprietary protocol for point to point communication over multi-gigabit transceivers – Fine grain flow control

34 Bridge Architecture Aurora RX thread Aurora TX thread TX RX GMAC RX TX GMAC TX thread GMAC RX thread Put16Get8 Memory Put8Get16 Memory

35 Bridge Test Setup

36 Bridge Results Compared result to VHDL code from XAPP777 – latency = time from last bit received to first bit sent

37 Remote Procedure Call Mechanism to invoke a procedure on a remote computer – used in NFS – Almost exclusive to workstations Message with the parameters to the function as well as information about the function being called Implement an RPC server with the functions add(x,y) and mult(x,y)

38 RPC Architecture RX TX GMAC ADD broadcast thread MULT ETH thread IP thread UDP thread RPC thread TX thread RX thread Put/Get Memories

39 RPC Test Setup Workstation to WorkstationWorkstation to FPGA

40 FPGA vs Workstation Perform several RPC calls to each from client workstation Each server system connected directly to the client through an optical gigabit Ethernet cable

41 Click Based Applications IPFilter Drop IPaddr rewriter To Device From Device queue From Device queue To Device From Device From Device CheckIP Header Lookup Drop Brodcasts DecIPTTLTo Device Drop Brodcasts DecIPTTLTo Device NAT IP Router - 2 Port (shown) - 16 Port (not shown)

42 Click Results

43 Outline of Talk Background Design Flow User Interface Compilation to Hardware High Level Tools Experiments/Results Conclusions

44 Conclusions Presented a programming model for mapping networking applications to FPGAs – An API of abstractions (user interface) – Generate VHDL from the description (compiler) Summary – Domain specific languages as a target design entry – FPGAs as a target for implementation – Platform based on threads and flexible memory architecture MIR as a starting framework Demonstrate efficient mappings/designs through four application examples