Jehandad Khan and Peter Athanas Virginia Tech

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Larrabee Eric Jogerst Cortlandt Schoonover Francis Tan.
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Programming Protocol-Independent Packet Processors
A Novel 3D Layer-Multiplexed On-Chip Network
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
400 Gb/s Programmable Packet Parsing on a Single FPGA Authors : Michael Attig 、 Gordon Brebner Publisher: 2011 Seventh ACM/IEEE Symposium on Architectures.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.
Better by a HAIR: Hardware-Amenable Internet Routing Brent Mochizuki University of Illinois at Urbana-Champaign Joint work with: Firat Kiyak (Illinois)
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Programmable Data Planes COS 597E: Software Defined Networking.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
A Relational Algebra Processor Final Project Ming Liu, Shuotao Xu.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
B212/MAPLD 2005 Craven1 Configurable Soft Processor Arrays Using the OpenFire Processor Stephen Craven Cameron Patterson Peter Athanas Configurable Computing.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Applied research laboratory David E. Taylor Users Guide: Fast IP Lookup (FIPL) in the FPX Gigabit Kits Workshop 1/2002.
Automated Design of Custom Architecture Tulika Mitra
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
XStream: Rapid Generation of Custom Processors for ASIC Designs Binu Mathew * ASIC: Application Specific Integrated Circuit.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Intel Slide 1 A Comparative Study of Arbitration Algorithms for the Alpha Pipelined Router Shubu Mukherjee*, Federico Silla !, Peter Bannon $, Joel.
1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Routing Semester 2, Chapter 11. Routing Routing Basics Distance Vector Routing Link-State Routing Comparisons of Routing Protocols.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
400 Gb/s Programmable Packet Parsing on a Single FPGA Author: Michael Attig 、 Gordon Brebner Publisher: ANCS 2011 Presenter: Chun-Sheng Hsueh Date: 2013/03/27.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
SketchVisor: Robust Network Measurement for Software Packet Processing
P4: Programming Protocol-Independent Packet Processors
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
F1-17: Architecture Studies for New-Gen HPC Systems
NFV Compute Acceleration APIs and Evaluation
HULA: Scalable Load Balancing Using Programmable Data Planes
Backprojection Project Update January 2002
Buffer Management and Arbiter in a Switch
Enabling machine learning in embedded systems
CS 268: Router Design Ion Stoica February 27, 2003.
Addressing: Router Design
ESE532: System-on-a-Chip Architecture
Architecture & Organization 1
FPGAs in AWS and First Use Cases, Kees Vissers
Anne Pratoomtong ECE734, Spring2002
Architecture & Organization 1
P4-to-VHDL: Automatic Generation of 100 Gbps Packet Parsers
STUDY AND IMPLEMENTATION
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)
Azure Accelerated Networking: SmartNICs in the Public Cloud
Dynamic High-Performance Multi-Mode Architectures for AES Encryption
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Implementing an OpenFlow Switch on the NetFPGA platform
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
P4FPGA : A Rapid Prototyping Framework for P4
Optimizing stencil code for FPGA
Final Project presentation
Network-on-Chip Programmable Platform in Versal™ ACAP Architecture
Project proposal: Questions to answer
CSE 502: Computer Architecture
BitMAC: An In-Memory Accelerator for Bitvector-Based
Presentation transcript:

Jehandad Khan and Peter Athanas Virginia Tech Creating Custom Network Packet Processing Pipelines on HMC-Enabled FPGAs Jehandad Khan and Peter Athanas Virginia Tech

FPGAs for Packet Processing The ideal co-processor Highly parallel arbitrary data paths No cache delays Low power Being able to simulate a design without going into synthesis is the second benefit Compared to CPUs and GPUs Design effort

We FPGAs

FPGAs for Packet Processing The not-so-ideal co-processor Long compile times Complicated design process Less abundant expertise Cost Limite Memory

We FPGA Design

Objective of Investigation FPGAs coupled with HMC? What is the achievable throughput ? What is the latency cost? What are the tradeoffs ?

Hybrid Memory Cube Lower Power Raw random access bandwidth Atomic operations Latency concerns Mention the Pico HMC interface and other IP

Hybrid Memory Cube Half Width Links 10 AXIS interfaces 128-bit wide Max throughput 45 Mlps/channel Also capable to connecting to other cubes

Hybrid Memory Cube M-700 Backplane AC-510 AC-510 AC-510 AC-510 AC-510 Host

Overall Flow Passes and Projections … Custom pragmas for hardware specific algs

HLS Benefits Follow up on OpenCL work Improved turnaround time Automatic pipelining Low area overhead Becoming a mature technology Direct integration of custom primitives

Lookup On-chip resource limitation Support Multiple Masks Maximize area / throughput Exploit HMC bandwidth

Lookup - Off-chip Control Plane Bloom Filter Mask0 HMC Subsequent Stages Incoming Packet ... Offchip Access Actions Update Priority Resolution Bloom Filter Mask N

Lookup On-Chip Suitable for small exact match tables Hashed access CRC32 as the hash function of choice

Parser Expression balancing Lookahead implementation Variable FLIT width Unroll parser loops

Salient Features Unroll parser loops Add HMC interface Merge Ingress and Egress Generate Control Plane API Add measurement logic

Simple NAT Mention the data flow aspect

Results Area Utilization (placeholder)

Future Work CMCs Atomic Operations Take operations closer to memory Flow counters

Questions

About Me PhD Candidate @ Virginia Tech Networking FPGAs Advised by Prof. Peter Athanas Graduating this fall Networking FPGAs Networking + FPGAs Algorithm Acceleration