On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.

Slides:



Advertisements
Similar presentations
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Advertisements

August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta
Internetworking II: MPLS, Security, and Traffic Engineering
Delivery and Forwarding of
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.
© 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—4-1 Implementing Inter-VLAN Routing Deploying Multilayer Switching with Cisco Express Forwarding.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
Low Power TCAM Forwarding Engine for IP Packets Authors: Alireza Mahini, Reza Berangi, Seyedeh Fatemeh and Hamidreza Mahini Presenter: Yi-Sheng, Lin (
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
A supernetwork.
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
1 K. Salah Module 5.1: Internet Protocol TCP/IP Suite IP Addressing ARP RARP DHCP.
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department of Computer Science and Engineering.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Dynamic FPGA Routing for Just-in-Time Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer Science and Engineering.
CSCI 4550/8556 Computer Networks Comer, Chapter 20: IP Datagrams and Datagram Forwarding.
張 燕 光 資訊工程學系 Dept. of Computer Science & Information Engineering,
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;
Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.
TCP/IP Protocol Suite 1 Chapter 6 Upon completion you will be able to: Delivery, Forwarding, and Routing of IP Packets Understand the different types of.
Automatic Tuning of Two-Level Caches to Embedded Applications Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University.
TCP/IP Protocol Suite 1 Chapter 6 Upon completion you will be able to: Delivery, Forwarding, and Routing of IP Packets Understand the different types of.
Lecture Week 8 The Routing Table: A Closer Look
TCP/IP Protocol Suite 1 Chapter 6 Upon completion you will be able to: Delivery, Forwarding, and Routing of IP Packets Understand the different types of.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Minimizing Rulesets for TCAM Implementation.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 4: Addressing in an Enterprise Network Introducing Routing and Switching in the.
Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,
David Wetherall Professor of Computer Science & Engineering Introduction to Computer Networks Hierarchical Routing (§5.2.6)
CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem Dept. of Computer.
Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification using Extended TCAMs Edward W. Spitznagel, Jonathan S. Turner,
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.
IP Address Lookup Masoud Sabaei Assistant professor
Internet Protocol: Routing IP Datagrams Chapter 8.
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.
A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
1 Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering.
IP Logical Networks COMP 3270 Computer Networks Computing Science Thompson Rivers University.
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
IP Routers – internal view
EEC-484/584 Computer Networks
Statistical Optimal Hash-based Longest Prefix Match
Ann Gordon-Ross and Frank Vahid*
Delivery, Forwarding, and Routing of IP Packets
Dynamic FPGA Routing for Just-in-Time Compilation
A Small and Fast IP Forwarding Table Using Hashing
Jason Klaus, Duncan Elliott Confidential
Dynamic Hardware/Software Partitioning: A First Approach
Automatic Tuning of Two-Level Caches to Embedded Applications
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Presentation transcript:

On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and a Department of Education GAANN fellowship

2 Introduction Boolean logic minimization typically used during logic synthesis Two-level logic minimization can be considered as a general optimization technique Many applications can benefit from using logic minimization dynamically –IP routing table reduction –Access control list (ACL) reduction

3 On-Chip Minimization Applications (IP Routing Table Reduction) Incoming IP packet Destination IP address Choose Longest Prefix MatchPort 7 Port 3125.x.x.x Port x.x Port x Prefix Next hop Lookup Destination IP in Routing Table

4 On-Chip Minimization Applications (IP Routing Table Reduction) Longest Prefix Match –Ternary CAM (McAuley & Francis, 1993) Store 0,1,* Store IP address as TCAM entry Store prefix length using the TCAM entries mask Fast Smaller hardware resources than binary CAM Very large power consumption –How can we reduce hardware resources and power consumption?

5 On-Chip Minimization Applications (IP Routing Table Reduction) Mask Extension (Liu, 2002) –Uses two-level logic minimization –Performing minimization for each update too slow –Incremental update Existing minimized set becomes don’t care set New route becomes single entry in on set Achieves an average of 50 updates/second –Not considering communication, though #IP Addr.MaskNext Hop P1P P2P Original TCAM Entries P1&P2P1&P2 Next HopMaskIP Addr.# TCAM Entries after Mask Extension Logic Minimization

6 On-Chip Minimization Applications (Access Control List Reduction) Access Control List (ACL) –Used to restrict IP traffic through network routers –ACL size can range anywhere from from 300 (UCR CS&E Dept.) to 10,000 (AOL) –Common use is to block a particular protocol or port number to avoid attacks such as Denial of Service attacks ACL Minimization –Similar approach as used for IP routing table reduction –However, order of the list must be preserved TypeProtocolIn IPOut PortIn PortOut IPAction ACL Input Format

7 Introduction (Off-chip Logic Minimization) Router MEM Proc. I$ D$ Network Router Chip Execute Minimizer MEM Transmit Data to Server Execute Minimizer Transmit Result to Router MEM Transmit Data to Server MEM Server Slow due to communication Sensitive to server failures Security issues

8 ARM7 Mem. DMA On-chip Minimizer MEM Proc. I$ D$ Network Router Chip On-chip Minimizer Introduction (On-chip Logic Minimization) ARM7 Initialize Minimizer ARM7 Mem. Execute Minimizer Indicate Completion Mem. ARM7 Router

9 On-Chip Logic Minimization Requirements On-chip Logic Minimization Requirements –Data Memory Resources On-chip minimization algorithms must be very memory conscious –Instruction Memory Resources On-chip minimization algorithm must incorporate simplified approaches that result in acceptable designs –Execution time Limited data and instruction memory will likely lead to longer execution times Must still remain reasonable –Quality of results Must be capable of producing solution relatively close to optimal Focus on developing an on-chip logic minimization tool that produces acceptable results with reasonable increases in execution time while using limited memory resources.

10 ROCM ROCM – Riverside On-Chip Minimizer –Two-level minimization tool –Utilized a combination of approaches from Espresso-II (Brayton, et al. 1984) and Presto (Svoboda & White, 1979) Optimize(F,D) { OrderCubes(F) for i=1 to |F| { c = F i (c',W) = IterativeExpand(F,D,c) F = (F  c') - W } IterativeExpand(F,D,c) { W = {} c' = c for i=1 to |c| { c' = Expand(c',i) (val,W') = ValidExpansion(F,D,c') if val = true W = W  W' else Revert(c',i) } return (c',W) }

11 ROCM Results – Quality (Full Routing Table Reduction) Only 2% larger than optimal on average

12 ROCM Results – Performance (Incremental Update Execution Time) ROCM executing on a 40MHz ARM7 requires less than 1 second On a 500 MHz Sun Ultra60 On a 40 MHz ARM 7

13 ROCM Results – Memory (Code Size and Data Memory Usage) Data Memory Usage Code Size Small code size of only 22 kilobytes Average data memory usage of only 1 megabyte

14 ROCM Results – Quality (Access Control List Reduction) Only 2% larger than optimal on average

15 Customizing ROCM ROCM Customization –Beneficial to optimize an algorithm for a particular application –Customize ROCM’s data structures and algorithms for a particular input size Require less memory Reduce dynamic memory allocation Improve performance –Created ROCM-32 customized for IP routing table reduction

16 Customized ROCM-32 Results (IP Routing Table Reduction) 37% reduction in execution time vs. ROCM 11% reduction in data memory usage vs. ROCM

17 Conclusions Presented Riverside On-Chip Minimizer (ROCM) Feasible to execute logic minimization on chip –Can be executed on an embedded 40 MHz ARM7 in seconds for real networking problem sizes –Requires small code size (22 kilobytes) –Requires small data memory (1 megabyte) Produces good results –On average only 2% larger than exact minimization Shown usefulness for networking applications

18 Future Work More Applications –May appear now that on-chip minimization is feasible Dynamic HW/SW Partitioning –Dynamically partition executing binary to on-chip configurable logic –Logic minimization is used during the logic synthesis stage –Initial work on dynamic HW/SW partitioning presented at DAC 2003 yesterday in session 15