Download presentation
Presentation is loading. Please wait.
Published byEileen Ferguson Modified over 9 years ago
1
Making Good Points : Application-Specific Pareto-Point Generation for Design Space Exploration using Rigorous Statistical Methods David Sheldon, Frank Vahid * Department of Computer Science and Engineering University of California, Riverside * Also with the Center for Embedded Computer Systems at UC Irvine
2
Counter bus W1 16 bytes 4 physical lines filled when line size is 32 bytes Off Chip Memory Line Concatenation [Zhang/Vahid/Najjar, ISCA 2003, ISVLSI 2003, TECS 2005] Parameterized Component: Cache 2 of 19 40% avg savings
3
David Sheldon, UC Riverside 3 of 19 FPGA Systems are Often Built from Parameterized Components Parameterized components include: Cache (e.g., size, associatively, line size) Processors Co-processors Buses (e.g., bit width, network-on-chip structure) uP MPEG Enc Cache config Bus FPGA DSP config
4
520 points Over 10 days ~35 min per point <1 min to execute Remaining time was in synthesis and place and route 520 points Over 10 days ~35 min per point <1 min to execute Remaining time was in synthesis and place and route Microblaze Soft-Core Processor – Design Space due to Parameters Pareto points: Points where no point exists that is better in all metrics. Cycles Equivalent LUTs 4 of 19
5
David Sheldon, UC Riverside 5 of 19 Pareto Points Differ Per Application and Per Criteria App a2 Designer B Platform App a1 Time Energy Time Energy Pareto points Designer A c1 c2 c3 c1 c2 c3 (a) (b) c1c3... c2
6
David Sheldon, UC Riverside 6 of 19 Previous Work: Parameter Interdependency graph Platune [Givargis/Vahid 2002]: Introduced parameter interdependency graph Edges – parameters are dependent Nodes not connected – independent Search dependent parameters exhaustively; compose local Pareto points into global points Greatly reduces search space if independent parameters Good results, 44 hours Randomized Approaches Pareto Simulated Annealing (PSA) [Talarico 2006] Good results, 6 hours Genetic Algorithms [Ascia 2005] Good results, 4 hours Platune’s Architecture MIPS I$ D$ MEM CPU–I$ Bus CPU–D$ Bus $-MEM Bus size assoc. linesize size assoc. code a code code Supply Voltage
7
David Sheldon, UC Riverside 7 of 19 Our Approach We developed Design-of-Experiments (DoE)-based technique to automatically generate a parameter interdependency graph Relieves designer of burden Technique to generate Pareto- points via parameter interdependency graph edge- weight-based algorithm Improve speed versus Platune Called DoE-Based Pareto-Point Generator (DPG) Time Performance
8
David Sheldon, UC Riverside 8 of 19 Design of Experiments (DoE) i$ size i$ assocd$ size d$ line d$ assoc m-i$ code m-i$ a code $-m code Supply Voltage MIPS I$ D$ MEM CPU–I$ Bus CPU–D$ Bus $-MEM Bus size assoc. linesize size assoc. code a code code Supply Voltage 2k 8 8k 8 32 Bi 4.1 DoE generates a set of orthogonal experiments that allows for statistical analysis of the search space
9
David Sheldon, UC Riverside 9 of 19 DPG Algorithm Subsequent DoE analysis determines main effects of parameters i$ size i$ assoc d$ size d$ line d$ assoc m-i$ code m-i$ a code $-m code Supply Voltage MIPS I$ D$ MEM CPU–I$ Bus CPU–D$ Bus $-MEM Bus size assoc. linesize size assoc. code a code code Supply Voltage
10
David Sheldon, UC Riverside 10 of 19 DPG Algorithm (cont.) Compute weight of each pair of nodes Sort edges in decreasing weight DK, (I$ assoc, CPU-I$ address code) DI, (I$ assoc, CPU I$ code) IK, (CPU-I$ code, CPU I$ address code) IQ, (CPU-I$ code, $-MEM address code) KQ, (CPU I$ address code, $-MEM address code)... MIPS I$ D$ MEM CPU–I$ Bus CPU–D$ Bus $-MEM Bus size assoc. linesize size assoc. code a code code Supply Voltage
11
David Sheldon, UC Riverside 11 of 19 DPG Algorithm (cont.) Pair wise merge of nodes Creates a sparse set of Pareto points The designer can direct the tool to fill in the regions of interest Original Pareto points Filled in Pareto points Time Energy
12
David Sheldon, UC Riverside 12 of 19 Platune – Pareto Graph with Fill-in jpeg
13
David Sheldon, UC Riverside 13 of 19 Platune – Pareto Graph with Fill-in b1_histogram
14
Interdependency Graph Comparison: Manual vs. Automated David Sheldon, UC Riverside 14 of 19 jpegb1_histogramg3fax
15
David Sheldon, UC Riverside 15 of 19 Platune Results 44 DPG is 30x faster than Platune 2.5x faster than Genetic Algorithms
16
Xilinx Microblaze Soft-Core Processor Tuned the Microblaze for various benchmarks Exhaustive data generated for 12 benchmarks for comparison The Microblaze also has a configurable cache, which allows for over 3,000 configurations. For these tests we used results previously generated thus giving us only 64 configurations. David Sheldon, UC Riverside 16 of 19 Microblaze bs FPU div mul MSR PCMP
17
David Sheldon, UC Riverside 17 of 19 Network on Chip – Results DPG also works on larger design spaces
18
DPG Scales Well David Sheldon, UC Riverside 18 of 19 Number of Parameters DPG Analysis Phase Total Design Space Percent of Design Space 6346453.13% 10671,0246.54% 1513632,7680.42% 202341,048,5760.02% 2535333,554,4320.001% 304971,073,741,8240.00005%
19
David Sheldon, UC Riverside 19 of 19 Conclusion DoE-Based Pareto-Point Generation (DPG) algorithm quickly finds good Pareto Points Results were better and obtained faster than previous Platune or randomized techniques Approach is easier to use – no designer knowledge of parameter interdependencies is needed Useful for FPGAs as well as other parameterized systems, such as SOCs synthesized to ASICs, parameterized SOCs, etc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.