Download presentation
Presentation is loading. Please wait.
1
Hot Chips 16August 24, 2004 OptimoDE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott Mahlke CCCP Research Group University of Michigan http://cccp.eecs.umich.edu Kriszti á n Flautner, Koen Van Nieuwenhove ARM Limited
2
Hot Chips 16August 24, 2004 OptimoDE Overview OptimoDE –A configurable VLIW-styled Data Engine architecture –Targeted at intensive data processing Characteristics –Very wide performance envelope Power / area / speed tradeoff Exploiting parallelism in applications –Unlimited data path configuration options –User extensible through ISA customization Semi-automatic design system –User-in-the-loop design, retargetable compiler toolchain
3
Hot Chips 16August 24, 2004 OptimoDE in a System On Chip SDRAM SoC AHB Bus Matrix ARM CPU DMA Controller Interrupt Controller SRAM I/O SDRAM Controller Memory Control Data Engine DATA 1 MEM CTRL DATA 2 FIFO switch Memory Data M1 M2 M S S S M S SM
4
Hot Chips 16August 24, 2004 OptimoDE Architecture Model Functional Units –ALU, ACU, Multipliers –Custom Memory –RAM ( asynch / synch ) –ROM I/O ports –addressable –handshake protocol Registers –Register files Interconnect –Direct connection –Shared bus Controller All layers required Intra-layer configuration interconnect Interconnect Controller CC interconnect … … Function units Memories regs … I/O ports Registers
5
Hot Chips 16August 24, 2004 Design Toolchain A.tmp DesignDEDEvelop Librarian save ISS set target 001010 110011 010110 110111 load run / profile A.inc A.inc.c 1 User Library instantiate #include … main() { … = dct(); } create_resource xxxxx xx dct xxxxx A.inc A.inc.c 2 A.inc A.inc.c 3 A Evaluation OptimoDE Library OptimoDE Library User Library A Definition LIFETIME LOAD
6
Hot Chips 16August 24, 2004 Compiler Toolchain * 1 + 2 * 4 * 5 + 6 + 7 * 8 INPUT OUTPUT + 3 * 1 + 2 + 3 012345012345 + 2 * 1 * 4 * 3 + 6 * 8 + 7 * 5 * 9 + 10 C Source Description DEvelop Micro- code check map compile Analysis feedback Syntax checks Dataflow analysis Match architecture and dataflow graph Optimize code and register use
7
Hot Chips 16August 24, 2004 32-point DCT Microarchitecture 2 Custom FUs, 2 RAM, 1 ROM, 3 ACU, 2 I/O ports Designer responsible for creating custom units manually
8
Hot Chips 16August 24, 2004 Retargetable Customization Prototype 2 technologies in OptimoDE –Automated ISA customization –Retargetable customization to an “application-area” Customizing for 1 application –Programmability Nominally programmable –Critical problem – Cannot sustain performance across similar applications –How well does a custom ISA generalize 5 encryption algorithms, create custom design for each Average loss >80% verses native [MICRO, 2003] –Proactive generalization creates a retargetable design
9
Hot Chips 16August 24, 2004 Creating Custom Instructions Candidate discovery –Identify customization opportunities Examine program DFG Partition DFG at: –Memory operations –Unprofitable edges Enumerate candidate subgraphs within each partition
10
Hot Chips 16August 24, 2004 Grouping and Selection Group candidate subgraphs with same structure Group 4 Group 2 Group 1 Group 3 Group 4 Group 3 Group 1Group 2 Estimate performance and cost for each group Cost: 0.5 Adders Gain: 1,000 Cycles Cost: 1 Adder Gain: 2,500 Cycles Cost: 2 Adders Gain: 10,000 Cycles Cost: 1 Adder Gain: 1,500 Cycles Greedily select groups to implement in hardware subject to budget
11
Hot Chips 16August 24, 2004 Wildcard – multiple functionality at nodes Input 2 Input 1 0xFF 0x4 0x8, 0x4 >> |,& +,- Output Proactively Generalize Groups Cost-effectively extend group functionality to enable reuse Input 2 Input 1 0xFF 0x8 >> | + Output Input 2 Input 1 0xFF 0x4 0x8, 0x4 >> |,& +,- Output Subsumed – configurable interconnect to bypass nodes
12
Hot Chips 16August 24, 2004 Native Speedups
13
Hot Chips 16August 24, 2004 Importance of Generalization Key: application run – application designed for
14
Hot Chips 16August 24, 2004 Designing for a Domain
15
Hot Chips 16August 24, 2004 Case Study - Md5
16
Hot Chips 16August 24, 2004 OptimoDE Design for this Point Input 4 Input 1 Input 3 Input 2 + + + << 0x5 >> 0x1B | Output ^ & ^ Input 1Input 2 Input 3 Output ALU 1ALU 2CFUSRAMACU RF … Control Memory
17
Hot Chips 16August 24, 2004 Die Area Breakdown ALU 1ALU 2CFUSRAMACU RF … Control Memory Die impact is artificially large because of naïve implementation OptimoDE = 5.5 mm 2 in 0.13 ARM 926EJ = 5.0 mm 2 in 0.13
18
Hot Chips 16August 24, 2004 Conclusions OptimoDE –Configurable VLIW-style data engine architecture –Automated tools for implementing embedded signal and data processing solutions Automatic retargetable customization –Customized design combined with cost-effective generalization –Performance programmability - Performance stability across a family of similar applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.