On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and a Department of Education GAANN fellowship
2 Introduction Boolean logic minimization typically used during logic synthesis Two-level logic minimization can be considered as a general optimization technique Many applications can benefit from using logic minimization dynamically –IP routing table reduction –Access control list (ACL) reduction
3 On-Chip Minimization Applications (IP Routing Table Reduction) Incoming IP packet Destination IP address Choose Longest Prefix MatchPort 7 Port 3125.x.x.x Port x.x Port x Prefix Next hop Lookup Destination IP in Routing Table
4 On-Chip Minimization Applications (IP Routing Table Reduction) Longest Prefix Match –Ternary CAM (McAuley & Francis, 1993) Store 0,1,* Store IP address as TCAM entry Store prefix length using the TCAM entries mask Fast Smaller hardware resources than binary CAM Very large power consumption –How can we reduce hardware resources and power consumption?
5 On-Chip Minimization Applications (IP Routing Table Reduction) Mask Extension (Liu, 2002) –Uses two-level logic minimization –Performing minimization for each update too slow –Incremental update Existing minimized set becomes don’t care set New route becomes single entry in on set Achieves an average of 50 updates/second –Not considering communication, though #IP Addr.MaskNext Hop P1P P2P Original TCAM Entries P1&P2P1&P2 Next HopMaskIP Addr.# TCAM Entries after Mask Extension Logic Minimization
6 On-Chip Minimization Applications (Access Control List Reduction) Access Control List (ACL) –Used to restrict IP traffic through network routers –ACL size can range anywhere from from 300 (UCR CS&E Dept.) to 10,000 (AOL) –Common use is to block a particular protocol or port number to avoid attacks such as Denial of Service attacks ACL Minimization –Similar approach as used for IP routing table reduction –However, order of the list must be preserved TypeProtocolIn IPOut PortIn PortOut IPAction ACL Input Format
7 Introduction (Off-chip Logic Minimization) Router MEM Proc. I$ D$ Network Router Chip Execute Minimizer MEM Transmit Data to Server Execute Minimizer Transmit Result to Router MEM Transmit Data to Server MEM Server Slow due to communication Sensitive to server failures Security issues
8 ARM7 Mem. DMA On-chip Minimizer MEM Proc. I$ D$ Network Router Chip On-chip Minimizer Introduction (On-chip Logic Minimization) ARM7 Initialize Minimizer ARM7 Mem. Execute Minimizer Indicate Completion Mem. ARM7 Router
9 On-Chip Logic Minimization Requirements On-chip Logic Minimization Requirements –Data Memory Resources On-chip minimization algorithms must be very memory conscious –Instruction Memory Resources On-chip minimization algorithm must incorporate simplified approaches that result in acceptable designs –Execution time Limited data and instruction memory will likely lead to longer execution times Must still remain reasonable –Quality of results Must be capable of producing solution relatively close to optimal Focus on developing an on-chip logic minimization tool that produces acceptable results with reasonable increases in execution time while using limited memory resources.
10 ROCM ROCM – Riverside On-Chip Minimizer –Two-level minimization tool –Utilized a combination of approaches from Espresso-II (Brayton, et al. 1984) and Presto (Svoboda & White, 1979) Optimize(F,D) { OrderCubes(F) for i=1 to |F| { c = F i (c',W) = IterativeExpand(F,D,c) F = (F c') - W } IterativeExpand(F,D,c) { W = {} c' = c for i=1 to |c| { c' = Expand(c',i) (val,W') = ValidExpansion(F,D,c') if val = true W = W W' else Revert(c',i) } return (c',W) }
11 ROCM Results – Quality (Full Routing Table Reduction) Only 2% larger than optimal on average
12 ROCM Results – Performance (Incremental Update Execution Time) ROCM executing on a 40MHz ARM7 requires less than 1 second On a 500 MHz Sun Ultra60 On a 40 MHz ARM 7
13 ROCM Results – Memory (Code Size and Data Memory Usage) Data Memory Usage Code Size Small code size of only 22 kilobytes Average data memory usage of only 1 megabyte
14 ROCM Results – Quality (Access Control List Reduction) Only 2% larger than optimal on average
15 Customizing ROCM ROCM Customization –Beneficial to optimize an algorithm for a particular application –Customize ROCM’s data structures and algorithms for a particular input size Require less memory Reduce dynamic memory allocation Improve performance –Created ROCM-32 customized for IP routing table reduction
16 Customized ROCM-32 Results (IP Routing Table Reduction) 37% reduction in execution time vs. ROCM 11% reduction in data memory usage vs. ROCM
17 Conclusions Presented Riverside On-Chip Minimizer (ROCM) Feasible to execute logic minimization on chip –Can be executed on an embedded 40 MHz ARM7 in seconds for real networking problem sizes –Requires small code size (22 kilobytes) –Requires small data memory (1 megabyte) Produces good results –On average only 2% larger than exact minimization Shown usefulness for networking applications
18 Future Work More Applications –May appear now that on-chip minimization is feasible Dynamic HW/SW Partitioning –Dynamically partition executing binary to on-chip configurable logic –Logic minimization is used during the logic synthesis stage –Initial work on dynamic HW/SW partitioning presented at DAC 2003 yesterday in session 15