CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.

Slides:



Advertisements
Similar presentations
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Advertisements

Cosc 3P92 Week 9 Lecture slides
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
© 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—4-1 Implementing Inter-VLAN Routing Deploying Multilayer Switching with Cisco Express Forwarding.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
IP Routing Lookups Scalable High Speed IP Routing Lookups.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Router Architecture : Building high-performance routers Ian Pratt
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
Efficient Multi-match Packet Classification with TCAM Fang Yu Randy H. Katz EECS Department, UC Berkeley {fyu,
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
1 A Novel Scalable IPv6 Lookup Scheme Using Compressed Pipelined Tries Author: Michel Hanna, Sangyeun Cho, and Rami Melhem Publisher: NETWORKING 2011 Presenter:
An Efficient Hardware-based Multi-hash Scheme for High Speed IP Lookup Department of Computer Science and Information Engineering National Cheng Kung University,
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Martin Austin Motoyama 1 Randy H. Katz 1 1 EECS.
Algorithms for Advanced Packet Classification with TCAMs Karthik Lakshminarayanan UC Berkeley Joint work with Anand Rangarajan and Srinivasan Venkatachary.
Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.
Chapter 9 Classification And Forwarding. Outline.
Computer Networks Switching Professor Hui Zhang
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Courtesy: Nick McKeown, Stanford More on IP and Packet Forwarding Tahir Azim.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,
Modular SRAM-based Binary Content-Addressable Memories Ameer M.S. Abdelhadi and Guy G.F. Lemieux Department of Electrical and Computer Engineering University.
Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification using Extended TCAMs Edward W. Spitznagel, Jonathan S. Turner,
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
1 ECE 526 – Network Processing Systems Design System Implementation Principles II Varghese Chapter 3.
1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.
Routing Prefix Caching in Network Processor Design Huan Liu Department of Electrical Engineering Stanford University
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Lecture 13: Reconfigurable Computing Applications October 10, 2013 ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,
Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
IP Routers – internal view
Transport Layer Systems Packet Classification
Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)
Scalable Memory-Less Architecture for String Matching With FPGAs
A Small and Fast IP Forwarding Table Using Hashing
Jason Klaus, Duncan Elliott Confidential
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
Presentation transcript:

CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer Science University of Pittsburgh

ISPASS 2007 Search ops in applications  Search (or lookup) operations represent an important common function  Network packet processing For each arriving packet, determine the output port Given packet information, find a matching classification rule Each look up can incur many memory accesses  Speech recognition Searching (e.g., dictionary lookup) takes up ~24% of CPU cycles  Forthcoming RMS (Recognition, Mining, and Synthesis) apps

ISPASS 2007 Search performance and power  Search performance must match increasing line speeds For OC-768, up to 104M packets must be processed per second Network traffic has doubled every year [McKeown03] Routing tables (~200K prefixes in a core router) are growing [RIS] IPv6  Power and thermal issue already a critical limiting factor in network processing device design [McKeown03]  Search in battery-operated devices should be energy-efficient  Conventional search solutions Software methods (tries, hash table, …) Hardware methods (CAM, TCAM, …)

ISPASS 2007 IP lookup using a trie  Consider an IP address:  Software approach is “flexible”  high memory capacity requirement  high memory bandwidth requirement  not SCALABLE

ISPASS 2007 IP lookup using TCAM  Consider an IP address: * * * 01000* 01100* 01101* 11011* 0100* 0110* 1101* 10* 0* sort before storing choose the first among the matched  high bandwidth, constant time lookup  TCAMs are relatively small, expensive  power consumption very high  not SCALABLE

ISPASS 2007 CA-RAM – a hybrid approach  Can we do better than the existing conventional schemes? CAM-like search performance RAM-like cost and power  CA-RAM combines hashing w/ hardware parallel matching  CA-RAM design goals High lookup performance Low power consumption Smaller chip area per stored datum Straightforward system-level integration

ISPASS 2007 Talk roadmap  What is CA-RAM?  Prototype design  Case study 1: IP lookup  Case study 2: Trigram lookup for speech recognition

ISPASS 2007 CA-RAM – Content Addressable RA M  Separate match logic and memory  Match logic for a single row, not every row  Allows the use of dense RAM technology  Enables highly reconfigurable match logic  Keep keys sorted in each row, not in entire array Match logic Memory cells Conventional CAM/TCAMCA-RAM

ISPASS 2007 Very simple, yet efficient  Use hashing to store keys in a particular row  To look up, hash the search key and retrieve one row  Perform matching on entire row in parallel  Achieve full content addressability w/o paying overhead! Index generator Key i1 Match processor 1 … … Key i2 Key j2 Key j1 Match processor 2 … search key

ISPASS 2007 Pipelined CA-RAM operation Index generatorSearch key Key i1 Match processor 1 Key i2 Key j2 Key j1 Match processor 2 ResultMatch processor 3 Key i3 Key j3 Step 1Step 2Step 3Step 4 Index Key j2 Key j1 Key j3 Search keyMatch processor 2 Index generationMemory access Key matching Result forwarding

ISPASS 2007 Dealing w/ bucket overflows  Careful design of hash function  Increase bucket size Reduce load factor (  );  = # of occupied entries / # of total entries  Use “chaining”; store overflows in subsequent rows Multiple accesses per lookup  Use a small overflow CAM, accessed in parallel Similar to popular “victim caching”  Use two-level hashing and employ multiple CA-RAM banks … …

ISPASS 2007 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards … …

ISPASS 2007 Adapting key size Key i1 Reconfigurable match logic Key i2 Key j2 Key j1 Key i3 Key j3 Match information Key i1 Key i2 Key j2 Key j1  Adapting key size is straightforward  Will benefit supporting multiple apps/ standards Select key bits for matching

ISPASS 2007 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t … …

ISPASS 2007 Supporting binary/ternary matching Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Mask j1 Mask i1  Developed configurable comparator  T-matching requires 2 bits / 1 symbol  Supporting different types of matching in different bit positions feasible Consider mask bits or not

ISPASS 2007 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t  Storing data and keys in a CA-RAM module Cuts # of memory accesses for a lookup by half … …

ISPASS 2007 Simult. key matching & data access Reconfigurable match logic Match information Key i1 Key i2 Key j2 Key j1 Search key Data j1 Data i1  Data access follows TCAM lookup  CA-RAM supports data embedding  Cuts memory traffic & latency by half Match result & Data Match key & bypass data

ISPASS 2007 CA-RAM reconfig. opportunities Reconfigurable match logic allows:  Adapting key size to apps Same hardware to support multiple apps or standards  Binary and ternary matching Some apps require ternary matching, some don’t  Storing data and keys in a CA-RAM module Cuts # of memory accesses for IP lookup by half  Providing range checking capabilities Beneficial for rule-based packet filtering … …

ISPASS 2007 Supporting range checking Reconfigurable match logic Match information Key i1 Range i1 Range j1 Key j1 Search key  (Range checking causes troubles)  (Entries must be expanded)  CA-RAM can upport range checking efficiently Match key & check range

ISPASS 2007 CA-RAM-based memory subsystem

ISPASS 2007 Prototype implementation  We implemented a prototype CA-RAM slice design (w/ a degree of reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs  We used a standard cell (0.16  m) based ASIC design flow Step# cells Area,  m 2 Delay, ns Expand search key3,80466,228(0.89) Calculate match vector5,25210, Decode match vector8991, Extract result6,03721, Total15,992100,

ISPASS 2007 Area and power: CA-RAM vs. TCAM Per Cell Area (um 2 4.5x 11x 4.5Mb Power 14x 4x Cell area (  m 2 CMOS Power (W)  CA-RAM area advantage 4.5x~11x  CA-RAM power advantage 4x~14x

ISPASS 2007 Performance: CA-RAM vs. (T)CAM

Case study 1: IP lookup

ISPASS 2007 Problem description  Given A set of prefixes (each prefix is associated with output port number) IP address  Find a prefix that matches with input IP address and return output port number associated with it In the presence of multiple matching prefixes, choose the longest  Procedure Find a good hash function to distribute prefixes Determine CA-RAM organization

ISPASS 2007 Data set and hashing method  IP core router’s table having 186,760 entries  Bit selection scheme [Zane et al. ‘03] 98% of prefixes are at least 16 bits long Select hash bits from the first 16 bits (low-order bits)

ISPASS 2007 Shaping CA-RAM Consider multiple design points: Design B Design A Design D Design C Design E Design F 2,048 rows  (32 entries) 4,096 rows  (64 entries) (  = 0.47) (  = 0.40) (  = 0.36) (  = 0.24) (  = 0.36)

ISPASS 2007 Performance Spilled entries Average memory access latency (  = 0.47)(  = 0.40)(  = 0.36) (  = 0.24)(  = 0.36) “Uniform” traffic “Skewed” traffic  With a properly chosen ,  CA-RAM achieves near-constant AMAL

ISPASS 2007 Area and power  CA-RAM advantageous over TCAM Design B Relative area or power

Case study 2: Trigram lookup in speech recognition

ISPASS 2007 Problem, data set, and hashing  Problem Look up a trigram in the trigram database  Data set A subset of the Sphinx trigram database We picked up entries having 13~16 characters Still 5,385,231 entries or 86MB  Hashing DJB, an efficient string hash function (Used in Sphinx)

ISPASS 2007 Result

ISPASS 2007 Data distribution

ISPASS 2007 Area comparison Relative area CAMCA-RAM

ISPASS 2007 CA-RAM conclusions  Compared w/ software methods Less # of memory accesses; higher lookup performance  Compared w/ CAM or TCAM Higher density matching that of DRAM  large lookup table Competitive performance Low power – a critical advantage for cost-effective system design Reconfigurable Can accommodate apps having different key/record sizes, binary vs. ternary searching requirements, range checking, … Can adopt new standards much more easily, e.g., IPv6  Two case studies show the efficacy of the CA-RAM approach 3~5× improvement in area and power, compared with CAM/TCAM

CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Questions?