Modular SRAM-based Binary Content-Addressable Memories Ameer M.S. Abdelhadi and Guy G.F. Lemieux Department of Electrical and Computer Engineering University.

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
A HIGH-PERFORMANCE IPV6 LOOKUP ENGINE ON FPGA Author : Thilan Ganegedara, Viktor Prasanna Publisher : FPL 2013.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
Overview Memory definitions Random Access Memory (RAM)
Memory Management.
Efficient Multi-Match Packet Classification with TCAM Fang Yu
Study of IP address lookup Schemes
Packet Classification George Varghese. Original Motivation: Firewalls Firewalls use packet filtering to block say ssh and force access to web and mail.
EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;
Computer Architecture Part III-C: Memory Access and Management.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Redundant Array of Independent Disks
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
PEDS: Parallel Error Detection Scheme for TCAM Devices David Hay, Politecnico di Torino Joint work with Anat Bremler Barr (IDC, Israel), Danny Hendler.
Efficient Multi-Ported Memories for FPGAs Eric LaForest Greg Steffan University of Toronto Computer Engineering Research Group February 22, 2010.
IP Address Lookup Masoud Sabaei Assistant professor
Chapter 8 Memory Management Dr. Yingwu Zhu. Outline Background Basic Concepts Memory Allocation.
Theory and Applications of GF(2 p ) Cellular Automata P. Pal Chaudhuri Department of CST Bengal Engineering College (DU) Shibpur, Howrah India (LOGIC ON.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.
Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.
GLOBECOM (Global Communications Conference), 2012
FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.
IT253: Computer Organization
A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.
CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.
PEDS: A PARALLEL ERROR DETECTION SCHEME FOR TCAM DEVICES Author: Anat Bremler-Barr, David Hay, Danny Hendler and Ron M. Roth Publisher/Conf.: IEEE INFOCOM.
8.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles Implementation of Page Table Page table is kept in main memory Page-table base.
DATA STRUCTURE & ALGORITHMS (BCS 1223) NURUL HASLINDA NGAH SEMESTER /2014.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Virtual Memory 1 1.
1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,
Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.
P-Tree Implementation Anne Denton. So far: Logical Definition C.f. Dr. Perrizo’s slides Logical definition Defines node information Representation of.
A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.
9.1 Operating System Concepts Paging Example. 9.2 Operating System Concepts.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
A Flexible Interleaved Memory Design for Generalized Low Conflict Memory Access Laurence S.Kaplan BBN Advanced Computers Inc. Cambridge,MA Distributed.
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 8 – Memory Basics Logic and Computer Design.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 22 Memory Definitions Memory ─ A collection of storage cells together with the necessary.
Memory Management Continued Questions answered in this lecture: What is paging? How can segmentation and paging be combined? How can one speed up address.
8.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Fragmentation External Fragmentation – total memory space exists to satisfy.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
W4118 Operating Systems Instructor: Junfeng Yang.
A Multi-Ported Memory Compiler Utilizing True Dual- port BRAMs Ameer Abdelhadi and Guy Lemieux Department of Electrical and Computer Engineering University.
IP Routers – internal view
FPGAs in AWS and First Use Cases, Kees Vissers
Paging and Segmentation
Transport Layer Systems Packet Classification
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Introduction to Operating Systems
Scalable Memory-Less Architecture for String Matching With FPGAs
Lecture 3: Main Memory.
Shared Memory Accesses
Ameer M.S. Abdelhadi*, Guy G.F. Lemieux+, and Lesley Shannon*
CS703 - Advanced Operating Systems
Path Oram An Extremely Simple Oblivious RAM Protocol
Presentation transcript:

Modular SRAM-based Binary Content-Addressable Memories Ameer M.S. Abdelhadi and Guy G.F. Lemieux Department of Electrical and Computer Engineering University of British Columbia Vancouver, Canada a place of mind THE UNIVERSITY OF BRITISH COLUMBIA

Binary Content-Addressable Memory (BCAM) 1/17 Hardware-based Single-Cycle Parallel Search Engines Write Stores new data at specific address Match Search all addresses for a given data (pattern) Found in ‘2’ B C D A BCAM Search for ‘D’ B C D A BCAM Store ‘C’ in ‘1’

BCAM Applications Memory management Associative caches Translation lookaside buffers (TLBs) Databases Eliminates memory bottleneck Networking IP lookup in routing/ forwarding tables Intrusion detection detect predefined suspicious packages Packet Classification Pattern matching e.g. DNA sequence lookup Data compression find and shorten redundant patterns Data encryption find and encrypt specific patterns 2/17

Motivation - FPGAs 3/ ’s Memory Blocks 100,000’s Logic Elements No dedicated BCAM resources in FPGAs

Objectives BCAMs Massively parallel memory search Require high memory bandwidth FPGAs Block RAMs are main storageLimited memory bandwidth 4/17 Use BRAMs to construct BCAMs Modular and flexible Storage efficient Single-cycle Performance oriented

Associative Arrays Algorithmic Heuristics 5/17 HashesSearch Trees: Tries, BSTs, … Multi-unpredictable-cycle Data dependent performance Variable search depthMisses due to conflicts

Register-Based BCAM Concurrent register read and compare Single-cycle Limited resourcesComplex routingFits small BCAMs 6/17

Brute-Force Transposed-Indicators-RAM (1) A Traditional BRAM-based BCAM 7/17 Key idea: Transposed RAM - data becomes addresses Write Write ‘0’ to location ‘B’ Match Read location ‘D’ for match ‘2’ A B C D ‘D’ A B C D ‘0’ to ‘B’ * Xilinx App Notes

Brute-Force Transposed-Indicators-RAM (2) Storing Data to Multiple Addresses How can we store data to multiple addresses? Specify addresses using one-hot coding Each bit indicates a match or “store at location”  PROBLEM: Depth of CAM is limited by data width of RAM e.g. to build 1M deep CAM, we need 1M bits wide In FPGAs: 1000 BRAMs x 32bit wide = 32K deep CAM 8/17 BRAM-based Single-cycle Depth of CAM is limited by RAM width D C B A

BCAM Cascading PROBLEM: Patterns are encoded as RAM addresses  RAM depth is exponential to pattern width Solution: Cascading 1.Divide pattern into smaller slices 2.Search for each slice separately 3.If all slices are found  pattern match!  RAM depth is linear to pattern width 9/17 RAM Depth = 2 Pattern Width RAM Depth = 2 Slice Width x (Pattern Width / Slice Width) CAM Slice Matched Pattern Slice Match … … … Pattern Match

Hierarchical Search 2D BCAM (1) Narrow and Deep BCAM 10/17 Key idea: Hierarchical search 1D BCAM 2D BCAM 4M too deep 2k  Find a set (row) with match using a 1D BCAM  Search this set (row) in parallel for a specific match

Hierarchical Search 2D BCAM (2) Example Divide address space into segments RAM: each segment in a line Transposed-RAM: indicates “pattern in segment?” Hierarchical Search: 1.Find a row (segment) with match using a 1D BCAM 2.Search this row (segment) in parallel for a specific match 11/ addresses patterns addresses patterns Transposed-RAM RAM

Hierarchical Search 2D BCAM (2) Example Divide address space into sets RAM: each segment in a line Transposed-RAM: indicates “pattern in segment?” Hierarchical Search: 1.Find a row (segment) with match using a 1D BCAM 2.Search this row (segment) in parallel for a specific match 11/ addresses patterns addresses patterns Transposed-RAM RAM

Hierarchical Search 2D BCAM (2) Example Divide address space into sets RAM: each set in a line Transposed-RAM: indicates “pattern in segment?” Hierarchical Search: 1.Find a row (segment) with match using a 1D BCAM 2.Search this row (segment) in parallel for a specific match 11/ addresses patterns sets patterns Transposed-RAM RAM

Hierarchical Search 2D BCAM (2) Example Divide address space into sets RAM: each set in a line Transposed-RAM: indicates “pattern in set?” Hierarchical Search: 1.Find a row (segment) with match using a 1D BCAM 2.Search this row (segment) in parallel for a specific match 11/ sets patterns Transposed-RAM RAM sets

Hierarchical Search 2D BCAM (2) Example Divide address space into sets RAM: each set in a line Transposed-RAM: indicates “pattern in set?” Hierarchical Search: 1.Find a set (row) with match using a 1D BCAM 2.Search this row (segment) in parallel for a specific match 11/ patterns Transposed-RAM RAM sets Match pattern ‘3’

Hierarchical Search 2D BCAM (2) Example Divide address space into sets RAM: each set in a line Transposed-RAM: indicates “pattern in set?” Hierarchical Search: 1.Find a set (row) with match using a 1D BCAM 2.Search this set (row) in parallel for a specific match 11/ patterns Transposed-RAM RAM sets

Hierarchical Search 2D BCAM (3) Pros and Cons Single match only Cannot be cascaded RAM depth is exponential to pattern width Inefficient for wide patterns 12/17 BRAM-BasedSingle-cycle Efficient for deep CAMs

Indirectly-Indexed 2D (II2D) BCAM (1) Cascadable Wide and Deep BCAM 13/17 PROBLEM: is it possible to regenerate matches for all addresses? patterns addresses Match Indicators

Indirectly-Indexed 2D (II2D) BCAM (1) Cascadable Wide and Deep BCAM 13/17 PROBLEM: is it possible to regenerate matches for all addresses? Key observation Transposed RAM is a sparse matrix n columns (set of addresses) accommodates n matches (1’s) at most! patterns addresses Match Indicators

Indirectly-Indexed 2D (II2D) BCAM (1) Cascadable Wide and Deep BCAM 13/17 Key idea: use indirect indices to point to intra-set matches Cascadable Scalable (linear growth) Supports wider patterns PROBLEM: is it possible to regenerate matches for all addresses? Key observation Transposed RAM is a sparse matrix n columns (set of addresses) accommodates n matches (1’s) at most! patterns Sets Indicators Indices Intra-set Match Indicators

Indirectly-Indexed 2D (II2D) BCAM (1) Cascadable Wide and Deep BCAM 13/17 Key idea: use indirect indices to point to intra-set matches Cascadable Scalable (linear growth) Supports wider patterns PROBLEM: is it possible to regenerate matches for all addresses? Key observation Transposed RAM is a sparse matrix n columns (set of addresses) accommodates n matches (1’s) at most! patterns Sets Indicators Indices BRAM LUTRAM Intra-set Match Indicators

Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM 14/ addresses patterns Transposed-RAM patterns RAM (reference) addresses

Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM 14/ addresses patterns Transposed-RAM patterns RAM (reference) addresses

Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM 14/ addresses patterns Transposed-RAM Indicators-RAMs patterns RAM (reference) addresses

Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM 14/ sets patterns Transposed-RAM Indicators-RAMs patterns RAM (reference) addresses

Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM 14/ sets patterns Transposed-RAM Indicators-RAMs patterns RAM (reference) addresses Match pattern ‘3’

Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM 14/ sets patterns Transposed-RAM Indicators-RAMs patterns RAM (reference) addresses Match pattern ‘3’

Indirectly-Indexed 2D (II2D) BCAM (2) Example Divide address space into sets Store sets with a match in Indicators-RAM Transposed-RAM stores indices to all matches in set Hierarchical Search: Find indices of all matching sets in Transposed-RAM Read Indicators-RAM using indices from Transposed-RAM 14/ sets patterns Transposed-RAM Indicators-RAMs patterns RAM (reference) addresses Match pattern ‘3’ Found in ‘1’ and ‘2’

Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax register-based BCAM register consumption II2D linear ALM consumption; similar to other methods HS exponential BRAM consumption II2D linear BRAM consumption II2D supports wider patterns 15/17

Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax register-based BCAM register consumption II2D linear ALM consumption; similar to other methods HS exponential BRAM consumption II2D linear BRAM consumption II2D supports wider patterns 15/17

Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax register-based BCAM register consumption II2D linear ALM consumption; similar to other methods HS exponential BRAM consumption II2D linear BRAM consumption II2D supports wider patterns 15/17

Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax register-based BCAM register consumption II2D linear ALM consumption; similar to other methods HS exponential BRAM consumption II2D linear BRAM consumption II2D supports wider patterns 15/17

Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax register-based BCAM register consumption II2D linear ALM consumption; similar to other methods HS exponential BRAM consumption II2D linear BRAM consumption II2D supports wider patterns 15/17

Indirectly-Indexed 2D (II2D) BCAM (3) Area and Performance Except for very a narrow HS, II2D exhibits higher Fmax register-based BCAM register consumption II2D linear ALM consumption; similar to other methods HS exponential BRAM consumption II2D linear BRAM consumption II2D supports wider patterns 15/17

Open Source 16/17 Modular and parametric Verilog files Run-in-batch simulation and synthesis manager

Conclusions 17/17 Brute-Force Transposed-RAM BRAM-based Single-cycle Deep  Wide Cascadable patterns addresses Scalable Match Indicators

Conclusions 17/17 Hierarchical Search 2D BCAM BRAM-based Single-cycle Deep Wide  Cascadable  patterns sets Scalable  Set Match Indicators

Conclusions 17/17 Indirectly-Indexed 2D (II2D) BCAM BRAM-based Single-cycle Deep Wide Cascadable patterns sets Scalable Intra-set Match Indicators Indicators Indices Multi-unpredictable-cycle

Thank You!

Backup Slides

16 II2D Hierarchical Brute-Force patterns addresses Match Indicators patterns s e t s Match Indicators patterns s e t s Indicators Indices Match Indicators BRAM-based Single-cycle Deep  Wide Conclusions