Download presentation
Presentation is loading. Please wait.
1
University of Michigan Electrical Engineering and Computer Science 1 Processor Acceleration Through Automated Instruction Set Customization Nathan Clark, Hongtao Zhong, Scott Mahlke Advanced Computer Architecture Lab University of Michigan, Ann Arbor December 3, 2003
2
University of Michigan Electrical Engineering and Computer Science 2 Motivation Cell phones, PDAs, digital cameras, etc. are everywhere –High performance yet low power design point General core + ASIC solution –Limited post-programmability General core + application specific instructions (CFUs) CPU ASIC CPU CFU
3
University of Michigan Electrical Engineering and Computer Science 3 What is a CFU? Combine multiple primitive operations –Smaller code size, fewer RF reads –Increases performance & | << ^ & * + ^ + + ^ + ^ | CFU 1 + ^ CFU 2 &<< | 2 ^ 2 * 1 + 1 1
4
University of Michigan Electrical Engineering and Computer Science 4 Automation is Key This is ¼ of the DFG for a single basic block of blowfish 159 XOR 164 SHR173 AND
5
University of Michigan Electrical Engineering and Computer Science 5 Related Work Tensilica Xtensa –Commercial example –MIPS core + manually constructed CFU Automatic instruction set synthesis is mature field –See paper for comparison of techniques Our contributions –Novel technique for automatic CFU creation –System to utilize CFUs in multiple applications –Analysis of how effectively CFUs for one application apply to other applications in the same domain
6
University of Michigan Electrical Engineering and Computer Science 6 System Overview Synthesis –Subgraph identification Discover candidates for CFUs Weed out what shouldn’t be picked –Selection Determine which candidates to use as CFUs Compilation –Subgraph replacement Make use of the CFUs in a range of applications
7
University of Michigan Electrical Engineering and Computer Science 7 Subgraph Identification Grow subgraphs from seed nodes –All nodes are seeds –Most directions don’t make sense How to decide where to grow? –Making decisions using factors similar to an architect –Take 4 factors into consideration Criticality, Latency, Area, Input/Output % ^ << +* & |
8
University of Michigan Electrical Engineering and Computer Science 8 Subgraph Identification Grow subgraphs from seed nodes –All nodes are seeds –Most directions don’t make sense How to decide where to grow? –Making decisions using factors similar to an architect –Take 4 factors into consideration Criticality, Latency, Area, Input/Output % ^ << +* & | CFU Candidates & <<
9
University of Michigan Electrical Engineering and Computer Science 9 Subgraph Identification Grow subgraphs from seed nodes –All nodes are seeds –Most directions don’t make sense How to decide where to grow? –Making decisions using factors similar to an architect –Take 4 factors into consideration Criticality, Latency, Area, Input/Output Sum of these factors determines value of each direction –NOT picking CFUs % ^ << +* & | CFU Candidates & <<& +
10
University of Michigan Electrical Engineering and Computer Science 10 Critical Path Combining operations on the critical path will shrink the longer dependence chains –Maximize potential performance gain Wt = –Slack is # cycles off longest dependence path ^& ^ >> &&& ++ << ++ + + + 10/(0+1) = 1010/(2+1) = 3.33
11
University of Michigan Electrical Engineering and Computer Science 11 Latency Growing toward low latency operations allows combination of more nodes in a cycle –Maximize DFG compression Wt = ^& ^ >> &&& ++ << ++ + + + 10*0.3 / 0.6 = 5 10*0.3 / 0.36 = 8.33 OpcodeAreaCycles + 1.000.30 & 0.120.06 > 0.01~0.00 ^ 0.160.09
12
University of Michigan Electrical Engineering and Computer Science 12 Area Want the most benefit for the least area Wt = Area is the sum of macrocell areas ^& ^ >> &&& ++ << ++ + + + 10*0.5/0.5 = 10 10*0.5/1.5 = 3.33 OpcodeAreaCycles + 1.000.30 & 0.120.06 > 0.01~0.00 ^ 0.160.09
13
University of Michigan Electrical Engineering and Computer Science 13 Input/Output Want CFUs to use as few RF ports as possible –Smaller encoding –Allow growth of larger candidates Wt = ^& ^ >> &&& ++ << ++ + + + 10*2/(2+1)= 6.67 10*2/(4+1)= 4
14
University of Michigan Electrical Engineering and Computer Science 14 Example ^& ^ >> &&& ++ << ++ + + + 35 28.5 37.5 30.8 28.537.5
15
University of Michigan Electrical Engineering and Computer Science 15 Example ^& ^ >> &&& ++ << ++ + + + 35 28.5 33.5 30.8 28.540
16
University of Michigan Electrical Engineering and Computer Science 16 Example ^& ^ >> &&& ++ << ++ + + + 35 28.5 36 30.8 28.5 36
17
University of Michigan Electrical Engineering and Computer Science 17 Example ^& ^ >> &&& ++ << ++ + + +
18
University of Michigan Electrical Engineering and Computer Science 18 Example ^& ^ >> && ++ << ++ + + + &
19
University of Michigan Electrical Engineering and Computer Science 19 Example & ^ >> && ++ << ++ + + + & ^
20
University of Michigan Electrical Engineering and Computer Science 20 Example & ^ >> && ++ << ++ + + + & ^
21
University of Michigan Electrical Engineering and Computer Science 21 Example & ^ >> && + << ++ + + + & ^ +
22
University of Michigan Electrical Engineering and Computer Science 22 Example & ^ >> && + << ++ + + + & ^ +
23
University of Michigan Electrical Engineering and Computer Science 23 Example & ^ >> && + ++ << + + + & ^ +
24
University of Michigan Electrical Engineering and Computer Science 24 & ^ >> && + ++ << + + + & ^ + Finished – Met External Constraints
25
University of Michigan Electrical Engineering and Computer Science 25 Set of Candidates ^ << ^ ^ & ^ && ^ && ^ ^ && + ^ ^ && + + ^ ^ && + + ^ ^ && + + ^ & ^ && + + ^
26
University of Michigan Electrical Engineering and Computer Science 26 Avoids Exponential Explosion 1.00 1.25 1.50 1.38 1.13 Speedup
27
University of Michigan Electrical Engineering and Computer Science 27 Greedy Selection Heuristic Subgraph Number ValueCostOps 1204(3,4),(6,8) 261(1,3,7) ………… N95(1,7) Subgraph Number ValueCostOps 1104(6,8) 261(1,3,7) ………… N05 Use estimates of performance improvement / cost
28
University of Michigan Electrical Engineering and Computer Science 28 Multiple applications can utilize CFUs Vflib pattern matcher [Cor ’99] 3 5 6 14 2 Compiler Replacement Instruction Synthesis CFU Description Compiler 3 5 CFU 4 2 1
29
University of Michigan Electrical Engineering and Computer Science 29 Experimental Setup Implemented in the Trimaran toolset Baseline machine: 1 Int, 1 Flt, 1 Br, 1 Mem/Cycle –CFUs use Int issue slot CFU latency/area generated as sum of each individual macrocell –Pipeline latches were added if CFU latency >1 clock cycle –300 MHz clock assumed –No branch or memory instructions in CFUs Four application domains tested –Audio, Encryption, Image, Network
30
University of Michigan Electrical Engineering and Computer Science 30 Native Encryption Results
31
University of Michigan Electrical Engineering and Computer Science 31 Encryption Cross Compile
32
University of Michigan Electrical Engineering and Computer Science 32 Generalizing CFUs Subsumed (Multiple Paths) Wildcards (Multiple Nodes) >> | + IN_10x8 0xF IN_2 >> | + IN_1 0x0 0x8, 0x0 0x0 0xF, 0x0 IN_2 >> & |,& - +,- IN_10x8 0xF IN_2
33
University of Michigan Electrical Engineering and Computer Science 33 Effects of Generalization blowfish bfish-rijn bfish-sha rijndael rijn-bfish rijn-sha sha sha-bfish sha-rijn 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 CFUsSubsumed Subgraphs Speedup
34
University of Michigan Electrical Engineering and Computer Science 34 Conclusions Developed two phase instruction set synthesis system –Guide function removes bad candidates –Greedy selection heuristic Substantial speedups can be attained with very little die impact Subsumed subgraphs and wildcarding increase cross- application effectiveness DomainEncryptionNetworkImageAudio Ave. Speedup1.611.381.161.66
35
University of Michigan Electrical Engineering and Computer Science 35 Questions? http://cccp.eecs.umich.edu
36
University of Michigan Electrical Engineering and Computer Science 36 Backup slides
37
University of Michigan Electrical Engineering and Computer Science 37 Individual Factors - Blowfish
38
University of Michigan Electrical Engineering and Computer Science 38 Individual Factors - Djpeg
39
University of Michigan Electrical Engineering and Computer Science 39 Selection Uses estimates of performance improvement Greedy Heuristic used ^& ^ >> &&& ++ << ++ + + +
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.