Programming a Hyper-Programmable Architectures for Networked Systems Eric Keller and Gordon Brebner Xilinx Research Labs, USA.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Hao wang and Jyh-Charn (Steve) Liu
Supercharging PlanetLab : a high performance, Multi-Application, Overlay Network Platform Written by Jon Turner and 11 fellows. Presented by Benjamin Chervet.
Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
400 Gb/s Programmable Packet Parsing on a Single FPGA Authors : Michael Attig 、 Gordon Brebner Publisher: 2011 Seventh ACM/IEEE Symposium on Architectures.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
OS Fall ’ 02 Introduction Operating Systems Fall 2002.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
CS 550 Amoeba-A Distributed Operation System by Saie M Mulay.
Protocols and the TCP/IP Suite
Define Embedded Systems Small (?) Application Specific Computer Systems.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 MESCAL Application Modeling and Mapping: Warpath Andrew Mihal and the MESCAL team UC Berkeley.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
 The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Programming Model for Network Processing on FPGAs Eric Keller October 8, 2004 M.S. Thesis Defense.
Section I Introduction to Xilinx
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Chapter 2 Network Models
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Protocols and the TCP/IP Suite
1 Computer Networks DA Chapter 1-3 Introduction.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
N E T G R O U P P O L I T E C N I C O D I T O R I N O Towards Effective Portability of Packet Handling Applications Across Heterogeneous Hardware Platforms.
Architecting Web Services Unit – II – PART - III.
GBT Interface Card for a Linux Computer Carson Teale 1.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
J. Christiansen, CERN - EP/MIC
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
Programmable Logic Devices
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Basic Sequential Components CT101 – Computing Systems Organization.
® Java Debug Hardware Modules Using JBits by Jonathan Ballagh Eric Keller Peter Athanas Reconfigurable Architectures Workshop 2001.
Part A Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
Chapter 2 Protocols and the TCP/IP Suite 1 Chapter 2 Protocols and the TCP/IP Suite.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
FPL Sept. 2, 2003 Software Decelerators Eric Keller, Gordon Brebner and Phil James-Roxby Xilinx Research Labs.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
400 Gb/s Programmable Packet Parsing on a Single FPGA Author: Michael Attig 、 Gordon Brebner Publisher: ANCS 2011 Presenter: Chun-Sheng Hsueh Date: 2013/03/27.
OSI Model OSI MODEL. Communication Architecture Strategy for connecting host computers and other communicating equipment. Defines necessary elements for.
OSI Model OSI MODEL.
What is a Protocol A set of definitions and rules defining the method by which data is transferred between two or more entities or systems. The key elements.
Sequential Logic Design
Introduction to Programmable Logic
Electronics for Physicists
Software Defined Networking (SDN)
Embedded systems, Lab 1: notes
Chapter 15 – Part 2 Networks The Internal Operating System
OSI Model OSI MODEL.
Electronics for Physicists
Computer Networks DA2402.
Presentation transcript:

Programming a Hyper-Programmable Architectures for Networked Systems Eric Keller and Gordon Brebner Xilinx Research Labs, USA

Hyper-Programmable Architectures for Networked Systems Gordon Brebner, Phil James-Roxby, Eric Keller, Chidamber Kulkarni and Chris Neely Xilinx Research Labs, USA

What this talk is about Message Processing (MP) as a specific domain, addressing adaptable networked systems The Hyper-Programmable MP (HYPMEP) environment for domain-specific harnessing of programmable logic devices HAEC, an XML-based Level 2 API for the HYPMEP soft platform In brief, an initial experiment with HAEC

Networking everywhere “Ambient intelligence” “Disappearing computer” “Pervasive computing” “Ubiquitous computing” Network Networks on chipTheories of interaction

Message Processing (MP) Key future computation+communication paradigm “Message” chosen as neutral term, encompassing “cell”, “datagram”, “data unit”, “frame”, “packet”, “segment”, “slot”, “transfer unit”, etc. MP is ‘intermediate’ between Digital Signal Processing (DSP) and Data Processing (DP): – Like DSP, MP seems natural PLD territory – But, like DP, MP has more complex data types and more processing irregularity than DSP

Example: MP-style operations Is this message for me? Do I want this message? Change the address on this message. Break this message into two parts. Translate this message to another language. Validate a signature on this message. Retrieve this message from my mailbox. Queue this message up for delivery.

Classes of MP operations Matching and lookup – read-only on messages; results used for control Simple manipulations (that can be combined) – read/write on specific message fields Characteristic domain-specific computations – hook to allow complex (DSP or DP style) operations Message marshalling – movement, queueing and scheduling of messages

Comparison of DSP, MP and DP

Programmable logic Earliest: programmable array logic (PAL) and programmable logic array (PLA) devices – restrictions on structure of implemented logic circuitry Then: the Field Programmable Gate Array (FPGA) – basic device architecture has a large (up to multi-million) array of programmable logic elements interfaced to programmable interconnect elements Now: the Platform FPGA – a heterogeneous programmable system-on-chip device

Today’s Platform FPGA No longer just an array of programmable logic Example shown: Xilinx Virtex-4 (launched in September 2004) Very important: the programmable interconnect

PLDs for networked systems Vast bulk of successful present-day use: – PLD as direct substitute for ASIC or ASSP on board – conventional hardware (+software) design flow Maybe map network processor to PLD instead of ASIC Future opportunity: deliver modern PLD attributes directly to networked applications – remove bottlenecks from traditional design flows – implementations are still mainly a research topic

... Design automation tools for MP users (entry, debug,...) Programmable logic devices HYPMEP Environment API access Efficient mapping Hooks for existing IP cores and software HYPMEP soft platform Provide concurrency, interconnection and programmability Exploit concurrency, interconnection and programmability

Example: design entry in Click By Kohler et al (MIT, 2001) Shows a standards-compliant two-port IP packet router Each box is an instance of a pre-defined Click element Packets are ‘pushed’ and ‘pulled’ through the graph There are 16 elements on the data forwarding path Lookup Queue Simple op Input Output

HYPMEP soft platform APIs Level of abstraction determines complexity of compiler for efficient mapping to PLD Three levels of abstraction being investigated: – HIC: abstracted functions and memories – HAEC: abstracted functions; memory blocks – HOC: explicit function and memory blocks Backward mapping is as important as forward mapping, to preserve user abstraction level for testing, debugging and monitoring

Main HAEC components Threads: lightweight concurrent message processing entities compiled to PLD implementations Hooks: wrappers for existing functional blocks with PLD implementations Interfaces: for moving messages into or out of the system perimeter Memories: for storage of messages, system state or system data

System control flows A control flow is associated with each individual message within the system In simple case of message in/message out: – begins with thread activation on arrival of message – … thread starts one or more threads or hooks – … threads in turn can start more threads or hooks – … ultimately a thread handles departure of message Based upon lightweight start/stop mechanism Data plane - also have control plane control flows

Threads Each thread is implemented as a custom finite state machine, and threads run concurrently Concurrent instructions are associated with each each state, with dedicated implementations Instruction set may be programmed itself - seek simple operations fitted to message processing Instructions include memory accessing, and operations to interact with other threads

Example HAEC code for thread …

Inter-thread communication Have standard start/stop (and pause/resume) synchronization mechanism, seen earlier Two direct communication mechanisms: – lightweight direct data passing and signaling between two threads – data channels between threads: extra functionality can reside in the channel Indirect communication via shared memory is also possible (with care of course)

Hooks and blocks Threads provide a basis for programming many common processing tasks for network protocols Use hooks and blocks in other cases: – algorithms without natural FSM model (e.g. encryption) – existing implementations exist in logic or software Hook is the interfacing wrapper for a block: – allows activation of block by threads – allows connection of blocks to memories

Interfaces and memories Interface: – has an internal hook-style interface to block – has an external interface for the block – associated threads handle message input/output Memory – memory blocks present one or more ports to threads – ports are accessed by thread instructions – used for messages, lookup tables and state

Mapping HYPMEP to PLDs Must be efficient: – system: resource usage, timing, power – messages: throughput, latency, reliability, cost Interface-centric system model – as opposed to processor-centric for example – placement and usage of interfaces, memories and their interconnection dominates the mapping Standard tools for design-time hyper-programmability More specialized tools for run-time reconfiguration

Compiling HAEC to VHDL Each system component instantiated in HAEC is mapped to a hardware entity on the FPGA: – threads mapped to custom hardware – generation of signals required between threads – hooked blocks, interfaces and memories already exist as pre-defined netlists and are stitched in One major contribution of the compiler is the automatic generation of clock signals – transition from software world to hardware world

Remote Procedure Call example RPC protocol underpins Network File System (NFS) for example RPC over UDP over IP over Ethernet protocol stack FPGA is acting as a genuine Internet server End system example, as opposed to intermediate system (e.g. bridge, router) Before: use a 2 GHz Linux PC After: use a small FPGA (Xilinx XC2VP7)

RPC design results Operates at 1 Gb line rate Per-RPC protocol latency is 2.16 μs 7.5X over Linux on 2 GHz P4 10X attainable with small mods 2600 logic slices and 5 block RAMs Ethernet core is half the slices 869 lines of XML-based description... … compiled to 2950 lines of VHDL Design and implementation time: TWO PERSON-WEEKS

Conclusions and future plans Illustration of how PLDs can have primary roles in adaptable networked systems First generation of HYPMEP implemented Validated by various gigabit rate experiments Now exploring embedded networking applications Longer-term strategy is to, in tandem: – break down traditional hardware/software boundaries – break down data plane/control plane boundaries

The End