Download presentation
Presentation is loading. Please wait.
Published byJeffrey Dixon Modified over 9 years ago
1
Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, and a Department of Education GAANN fellowship
2
ARM7 MEM DMA On-chip Minimizer MEM Proc. I$ D$ System-On-Chip Introduction (On-chip Logic Minimization) Indicate Completion 3 Execute Minimizer2 Initialize Minimizer1
3
On-Chip Minimization Applications (IP Routing Table Reduction) 138.23.16.9 Port 7 Port 3125.x.x.x Port 5138.23.x.x Port 7138.23.16.x Prefix Next hop Incoming IP packet Destination IP Longest Prefix Match Lookup IP in Routing Table IP routing table reduction Routing tables of large network routers have over 30,000 entries Fast IP routing lookup is difficult without using large hardware resources Ternary CAM (McAuley & Francis, 1993) TCAM can be used to perform routing table lookup in single cycle Requires large resources and large power consumption Mask Extension (Liu, 2002) Uses two-level logic minimization to reduce the size of the routing table Good results but did not considering off-chip communication
4
On-Chip Minimization Applications (Access Control List Reduction) Access Control List (ACL) Used to restrict IP traffic through network routers ACL size can range anywhere from from 300 (UCR CS&E Dept.) to 10,000 (AOL) Common use is to block a particular protocol or port number to avoid attacks such as Denial of Service attacks ACL Minimization Similar approach as used for IP routing table reduction However, order of the list must be preserved TypeProtocolIn IPOut PortIn PortOut IPAction ACL Input Format
5
On-Chip Minimization Applications (Dynamic Hardware/Software Partitioning) Dynamic hardware/software partitioning (JIT compilation for FPGAs) Dynamically detects frequently executed loop and re- implements the software loops using on-chip configurable logic Requires logic synthesis tools to embedded on-chip Warp Processor MIPS/ ARM I$ D$ Profiler Configurable Logic Warp Processor Dynamic Partitioning Module Warp Processor
6
ROCM On-chip Logic Minimization Requirements Limited data and instruction memory available Quality of results must still be close to optimal Execution time should remain reasonable On-chip Logic Minimization Goal Focus on developing an on-chip logic minimization tool that produces acceptable results with reasonable increases in execution time while using limited memory resources ROCM – Riverside On-Chip Minimizer Two-level minimization tool Utilized a combination of approaches from Espresso-II (Brayton, et al. 1984) and Presto (Svoboda & White, 1979) Eliminate the need to computer the off-set to reduce memory usage Utilizes a single expand phase instead of multiple iterations On average only 2% larger than optimal solution
7
ROCM Results (Performance/Memory Usage) 500 MHz Sun Ultra60 40 MHz ARM 7 (Triscend A7) ROCM executing on 40MHz ARM7 requires less than 1 second Small code size of only 22 kilobytes Average data memory usage of only 1 megabyte
8
Codesign ROCM (Hardware Coprocessor) Customized ROCM enables us to develop an efficient hardware coprocessor Profiled the execution of ROCM-32 and ROCM-128 using ARM port of the SimpleScalar simulator Determine critical loops/functions that are suitable for implementation in hardware Identified six critical kernels that comprised 91% of the total execution time but only 2% of the code size
9
Codesign ROCM (Minimization Coprocessor) MEM ARM7 Min. Coproc. Min. Coproc. Proc/Mem Interface DoesInter IsCov GetLit SetLit Tautology.1 Cofactor.1 data addr Minimization Coprocessor On-Chip Minimizer
10
Codesign ROCM (Minimization Coprocessor) Proc/Mem Interface DoesInter Does Intersect IsCov GetLit SetLit Tautology.1 Cofactor.1 data addr Minimization Coprocessor aImpl dImplnumLits << << 1 32 (odd) 64 5 32 (even) == 0 retVal DoesIntersect
11
Codesign ROCM Results (Execution Time) Average speedup of 7.8
12
Codesign ROCM Results (Energy Consumption) Average energy reduction of 59.2%
13
Codesign ROCM (Minimization Coprocessor) Software modifications were required to achieve speedup of 7.8 Data structures/algorithms not suitable for hardware implementation Reorganized data structures Customized width of data items Eliminate memory allocation within critical regions Not automated with current hardware/software partitioning tools
14
Codesign ROCM (Minimization Coprocessor) for(i=0; i numImplicants; i++) { if( !DoesIntersect(implicant, xj) ) continue; for(k=0; k numLiterals; k++) { // determine coImplicant... } AddImplicant(cofactor, &coImplicant); } Move to HW 28.5% of total exec. time Original C Code Only 3.5% of total exec. time Requires dynamic memory allocation AddImplicant(cofactor, &coImplicant);
15
Codesign ROCM (Minimization Coprocessor) // determine size of cofactor initially cofactorSize = 0; for(i=0; i numImplicants; i++) { if( !DoesIntersect(implicant, xj) ) continue; cofactorSize++; } // allocate all memory outside of main loop cofactor->implicants = malloc(…); for(i=0; i numImplicants; i++) { if( !DoesIntersect(implicant, xj) ) continue; for(k=0; k numLiterals; k++) { // additional initialization code need for each iterations coImplicant = &(cofactor->implicants[index++]);... } Modified C Code // determine size of cofactor initially // allocate all memory outside of main loop // additional initialization code need for each iterations
16
Conclusions & Future Work Developed codesigned on-chip logic minimization Performance improvement of nearly 8X compared to earlier software only implementation Energy reduction of almost 60% New directions in hardware/software partitioning Designer effort was required to rewrite algorithms and fine tune data structures Could better hardware/software partitioning tools automate this?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.