Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
A Dynamic Binary Hash Scheme for IPv6 Lookup Q. Sun 1, X. Huang 1, X. Zhou 1, and Y. Ma 1,2 1. School of Computer Science and Technology 2. Beijing Key.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Scalable IPv6 Lookup/Update Design for High-Throughput Routers Authors: Chung-Ho Chen, Chao-Hsien Hsu, Chen -Chieh Wang Presenter: Yi-Sheng, Lin ( 林意勝.
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
An Efficient Hardware-based Multi-hash Scheme for High Speed IP Lookup Department of Computer Science and Information Engineering National Cheng Kung University,
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
GPGPU platforms GP - General Purpose computation using GPU
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Paper Review Building a Robust Software-based Router Using Network Processors.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.
Multi-dimensional Packet Classification on FPGA 100 Gbps and Beyond Author: Yaxuan Qi, Jeffrey Fong, Weirong Jiang, Bo Xu, Jun Li, Viktor Prasanna Publisher:
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2 y 1 s 1 c 1 x 0 y 0 s 0 c 0 MSB positionLSB position Ripple-Carry.
FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.
Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
Paper Review Presentation Paper Title: Hardware Assisted Two Dimensional Ultra Fast Placement Presented by: Mahdi Elghazali Course: Reconfigurable Computing.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Architecture of Microprocessor
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
AN ASYNCHRONOUS BUS BRIDGE FOR PARTITIONED MULTI-SOC ARCHITECTURES ON FPGAS REPORTER: HSUAN-JU LI 2014/04/09 Field Programmable Logic and Applications.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.
Kandemir224/MAPLD Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.
Hardwired Control Department of Computer Engineering, M.S.P.V.L Polytechnic College, Pavoorchatram. A Presentation On.
Packet Classification Using Dynamically Generated Decision Trees
Author: Weirong Jiang and Viktor K. Prasanna Publisher: The 18th International Conference on Computer Communications and Networks (ICCCN 2009) Presenter:
1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Kandemir224/MAPLD Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.
Programmable Hardware: Hardware or Software?
Backprojection Project Update January 2002
Scalable Memory-Less Architecture for String Matching With FPGAs
A High Performance SoC: PkunityTM
Memory Organization.
EMOMA- Exact Match in One Memory Access
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Presentation transcript:

Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International Symposium on (DDECS), rd International Conference on, April (2014) Luka´s Kekely, Martin ˇ Zˇ adn ´ ´ık, Jiˇr´ı Matousek, Jan Ko ˇ ˇrenek

Outline Introduction Related Work Design And Architecture Evaluation And Results Conclusion 2

Outline Introduction Related Work Design And Architecture Evaluation And Results Conclusion 3

Introduction Software applications of safety- and security-critical embedded systems are often divided into several self-contained functions. Between individual system partitions and functions. We use segregation to confine error propagation. Soft processors are one order of magnitude slower in terms of operating frequency than hard-wired devices. 4

Introduction(cont.) Current FPGA families provide wide and fast memory attachments mostly implemented as hard macros that are faster than configurable logic. There is a performance gap between soft processors and the memory attachment. Propose an architecture combines : The specific needs of partitioned software. The flexibility of reconfigurable hardware. 5

Introduction(cont.) Multiple self-contained systems on a single platform FPGA Shares available memory bandwidth among the systems In a predictable and scalable way. The main building blocks of the proposed architecture Secure bus bridges that are used to form a segregated hierarchy of memory busses. 6

Introduction(cont.) With secure bus bridges, it is possible to use soft processors for safety and security-critical functions. To reach high assurance levels with far less effort. 7

Outline Introduction Related Work Design And Architecture Evaluation And Results Conclusion 8

Related Work Cuckoo hash function h(x) h’(x) x = {a, b, c} 9 h(a)h(b)h(c) h’(b)h’(c)h’(a) h(a)h(b)h(c) h’(b)h’(c)h’(a) h(6) = 6 mod 11 = 6 h’(6) = floor(6/11) mod 11 = 0 x ={20, 50, 53,75}

Outline Introduction Related Work Design And Architecture Evaluation And Results Conclusion 10

Design And Architecture A. Lookup engine interface and functionality B. Cuckoo hash lookup engine C. Binary search tree lookup engine 11

Design And Architecture(cont.) A. Lookup engine interface and functionality B. Cuckoo hash lookup engine C. Binary search tree lookup engine 12

Design And Architecture(cont.) Lookup engine interface and functionality 13 Lookup Engine Key Width Data Width Maximum Capacity Representation in bits Interface

Design And Architecture(cont.) Lookup engine interface and functionality Lookup procedure 3 basic groups: Input Output Configuration 14

Design And Architecture(cont.) Lookup engine interface and functionality Lookup procedure 3 basic groups: 15 Lookup Engine Input keysLookup results Routing decision Key identification Arbitrary Data1 bit information Found Invalid (Outputs) Configuration (Every Clock Cycle)

Design And Architecture(cont.) A. Lookup engine interface and functionality B. Cuckoo hash lookup engine C. Binary search tree lookup engine 16

Design And Architecture(cont.) Cuckoo hash lookup engine 17

Design And Architecture(cont.) Cuckoo hash lookup engine 18 Parallel computing CRC implementation

Design And Architecture(cont.) Cuckoo hash lookup engine 19 Reading records Key valuedata Record Records from hash tables in memory or outside register

Design And Architecture(cont.) Cuckoo hash lookup engine 20 Compared for equality At most one comparison successful Data associated with matching key and set flag

Design And Architecture(cont.) Cuckoo hash lookup engine 21 Update key set based on requests received

Design And Architecture(cont.) Cuckoo hash lookup engine 22 Controller can evict records from hash tables on-the-fly preserving the set of active keys Reconfiguration cycle

Design And Architecture(cont.) Cuckoo hash lookup engine 23 C cuckoo = d x t + 1 d – The number of used hash tables(hash functions) t – The size of individual table 1 – Additional reconfiguration register

Design And Architecture(cont.) A. Lookup engine interface and functionality B. Cuckoo hash lookup engine C. Binary search tree lookup engine 24

Design And Architecture(cont.) Binary search tree lookup engine 25 Tree level (pipeline stage) Piece of memory Stage Address of a node Searched Key Comparator

Design And Architecture(cont.) Binary search tree lookup engine 26 Containing associated data to the key Piece of memory Stage Address of a node Searched Key Comparator

Design And Architecture(cont.) Binary search tree lookup engine 27 Atomic operations Piece of memory Stage Address of a node Searched Key Comparator Result corrected according to a register

28 The capacity of the BST based engine can be configured by the number of BST levels l. C bst = 2 l - 1 Design And Architecture(cont.) Binary search tree lookup engine

Design And Architecture(cont.) A. Lookup engine interface and functionality B. Cuckoo hash lookup engine C. Binary search tree lookup engine D. Top-level lookup engine 29

30 Design And Architecture(cont.) Top-level lookup engine Both Cuckoo and BST engine in parallel Both results are stored in FIFOs Cuckoo engineBST engine Stash FIFO

C total = d×t+1+s. d and t of the cuckoo hash and the stash size s 31 Design And Architecture(cont.) Top-level lookup engine The maximum capacity of the cuckoo hash with stash lookup engine can be defined:

Outline Introduction Related Work Design And Architecture Evaluation And Results Conclusion 32

Evaluation And Results Memory utilization can be computed in two basic ways: U cuckoo = (n−m)/C cuckoo U total = n/C total n: Total number of successfully inserted keys before the memory became full m: The number of keys that resides in the stash Stash can be always filled up to 100% of its capacity It can always put m = s The values of n must be acquired from the test runs 33

Evaluation And Results Evaluate the relation between achievable memory utilization of cuckoo hash and the used sizes of stash for different parameters. The memory utilization plotted in the graphs is U cuckoo and the size of the stash (s) is plotted as a portion of t. 34

Evaluation And Results(cont.) 35

Evaluation And Results(cont.) 36

Evaluation And Results(cont.) 37

Evaluation And Results(cont.) 38

Outline Introduction Related Work Design And Architecture Evaluation And Results Conclusion 39

Conclusion The proposed architecture leverages the combination of the cuckoo hash engine with BST engine with a focus on parallel implementation in FPGA. 40

THANK YOU 41