Download presentation
Presentation is loading. Please wait.
Published byDonna Simpson Modified over 8 years ago
1
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles philip@cs.ucla.edumajid@cs.ucla.edukaplan@cs.ucla.edu DAC ’04. June 9, 2004. San Diego Convention Center, San Diego, CA
2
Outline Custom Instruction Generation and Selection Resource Sharing Algorithm Description with Examples Datapath Synthesis Techniques Experimental Methodology and Results Summary
3
Custom Instruction Generation Compiler Profiles Application Code Extracts Favorable IR Patterns Synthesizes Patterns as Hardware Datapaths Custom Instruction Selection Area Constraints Limit on-Chip Functionality NP-Hard 0-1 Knapsack Problem Formulated as an Integer Linear Program (ILP) Custom Instruction Generation and Selection
4
For each custom instruction i Gain(i) : Estimated Performance Gain of i Area(i) : Estimated Area of i Selected(i) : 1 if i is Selected; 0 Otherwise Goal Maximize Gain of Selected Instructions Constraint Area of Selected Instructions FPGA Area < ILP Formulation for Instruction Selection Problem
5
What About Resource Sharing? Area = 17 Area = 25 Two DFGs 1.5 My Datapath Area = 28 ILP Area Estimate = 42 Area Costs 8 5 1 3
6
Analysis 0-1 Knapsack Problem Formulation Over- Estimated Area by 150% ILP Solvers Do Not Consider Resource Sharing How to Remedy This Develop a Resource Sharing Algorithm Avoid Additive Area Estimates Based on per- Instruction Costs
7
Resource Sharing for DFGs Given: A Set of DFGs G* = {G 1, …, G n } Goal: Construct a Consolidation Graph G C of Minimal Cost Constraints: G C Must be Acyclic G C Must be a Supergraph of each G i in G* That’s Life: The Problem is NP-Hard
8
Resource Sharing Overview G3G3 G4G4 G1G1 G2G2 Decompose Patterns into Input-Output Paths Path Based Resource Sharing (PBRS)
9
Resource Sharing Overview G3G3 G4G4 G1G1 G2G2 Decompose Patterns into Input-Output Paths Path Based Resource Sharing (PBRS)
10
Resource Sharing Overview Use Substring Matching to Share Resources Merge DFGs Along Matched Nodes G3G3 G4G4 G1G1 G2G2
11
Resource Sharing Overview Synthesize G C Requires Less Area than Synthesizing G 1 …G 4 Separately GcGc G3G3 G4G4 G1G1 G2G2
12
Area Costs 8 5 1 3 Path-Based Resource Sharing P1:() P2:()
13
P1:() P2:() MACStr O(L) L – Length of String ( ) Area of MACStr = 26 Maximum Area Common Substring Area Costs 8 5 1 3
14
P1:() P2:() MACSeq O(L 2 /logL) L – Length of String ( ) Area of MACSeq = 43 Area Costs 8 5 1 3 Maximum Area Common Subsequence
15
Resource Sharing Algorithm Global Phase Determine: Which DFGs to Merge An Initial Path to Merge Local Phase Aggressively Apply PBRS to Share Resources Between the DFGs Selected by the Global Phase Repeat Until all DFGs are Merged, or no Further Resource Sharing is Possible
16
Resource Sharing Algorithm Area Costs 8 5 1 3 G1G1 G2G2 G3G3 G4G4
17
Global Phase Area Costs 8 5 1 3 G3G3 G4G4 G1G1 G2G2
18
Global Phase Area Costs 8 5 1 3 G3G3 G4G4 G1G1 G2G2 MACSeq/MACStr
19
Entering Local Phase Area Costs 8 5 1 3 G1G1 G2G2 MACSeq/MACStr
20
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 1 2 2 2 2 2 G 12 MACSeq/MACStr
21
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 1 2 2 2 2 2 G 12 0 0 0 0 MACSeq/MACStr
22
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 0 0 0 0 1 2 2 2 2 2 G 12
23
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 0 0 0 0 1 2 2 2 2 2 G 12 MACSeq/MACStr
24
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 0 0 0 0 1 2 2 2 2 2 G 12 MACSeq/MACStr
25
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 0 0 0 0 2 2 2 2 G 12 MACSeq/MACStr
26
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 0 0 0 0 2 2 2 2 G 12 MACSeq/MACStr
27
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 0 0 0 0 2 2 2 2 G 12 MACSeq/MACStr
28
Local Phase Area Costs 8 5 1 3 G1G1 G2G2 0 0 0 0 2 2 2 2 G 12
29
Returning To Global Phase Area Costs 8 5 1 3 G 12 G3G3 G4G4
30
Global Phase Area Costs 8 5 1 3 G3G3 G4G4 G 12
31
Global Phase Area Costs 8 5 1 3 G3G3 G4G4 G 12 MACSeq/MACStr
32
Entering Local Phase Area Costs 8 5 1 3 G 12 G4G4 MACSeq/MACStr
33
Local Phase Area Costs 8 5 1 3 G4G4 0 0 0 0 G 12 G 124 4 4 4 12 MACSeq/MACStr
34
Local Phase Area Costs 8 5 1 3 G4G4 0 0 0 0 G 12 G 124 4 4 4 12 MACSeq/MACStr
35
Local Phase Area Costs 8 5 1 3 G4G4 0 0 0 0 G 12 G 124 4 4 4 12
36
Local Phase Area Costs 8 5 1 3 G4G4 0 0 0 0 G 12 G 124 4 4 4 12 MACSeq/MACStr
37
Local Phase Area Costs 8 5 1 3 G4G4 0 0 0 0 G 12 G 124 4 4 4 12 MACSeq/MACStr
38
A Local Decision Area Costs 8 5 1 3 0 0 0 0 G4G4 G 12 G 124 4 4 12 MACSeq/MACStr
39
A Local Decision Area Costs 8 5 1 3 0 0 0 0 G4G4 G 12 G 124 4 4 12
40
A Local Decision Area Costs 8 5 1 3 0 0 0 0 G4G4 G 12 G 124 4 4 12 MACSeq/MACStr
41
A Local Decision Area Costs 8 5 1 3 0 0 0 0 G4G4 G 12 G 124 4 4 12 MACSeq/MACStr
42
Cycles are Illegal Area Costs 8 5 1 3 0 0 0 0 ILLEGAL! 4 12 G 124 4 4 12 G 124 MACSeq/MACStr
43
Cycles are Illegal Area Costs 8 5 1 3 0 0 0 0 G 124 4 4 12 LEGAL! 4 12 G 124 MACSeq/MACStr
44
Local Phase Area Costs 8 5 1 3 0 0 0 0 G4G4 G 12 G 124 4 12
45
Returning To Global Phase Area Costs 8 5 1 3 G3G3 G 124
46
Global Phase Area Costs 8 5 1 3 G3G3 G 124
47
Global Phase Area Costs 8 5 1 3 G3G3 G 124 MACSeq/MACStr
48
Global Phase Area Costs 8 5 1 3 G3G3 G 124 3 3 3 124 G 1234 MACSeq/MACStr
49
Global Phase Area Costs 8 5 1 3 G3G3 G 124 3 3 3 124 0 0 0 0 G 1234 MACSeq/MACStr
50
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 3 3 3 124 G 1234
51
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 3 3 3 124 MACSeq/MACStr
52
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 3 3 3 124 MACSeq/MACStr
53
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 3 124 MACSeq/MACStr
54
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 3 124 MACSeq/MACStr
55
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 3 124
56
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 3 124 MACSeq/MACStr
57
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 3 124 MACSeq/MACStr
58
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 124 MACSeq/MACStr 124
59
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 124 MACSeq/MACStr 124
60
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 124 MACSeq/MACStr
61
Local Phase Area Costs 8 5 1 3 0 0 0 0 G3G3 G 124 G 1234 124
62
We’re Done Area Costs 8 5 1 3 G1G1 G2G2 G3G3 G4G4 G 1234
63
We’re Done Area Costs 8 5 1 3 G1G1 G2G2 G3G3 G4G4 Area = 17 Area = 25 Area = 14 Area = 20 G 1234 Area = 30 Total Area of DFGs = 76 G 1234
64
VLIW Synthesis Experimental Procedure Custom Instr. Generation Set of Patterns Machine-SUIF Compiler Consolidation Graph Construction Algorithm Consolidation Graph Estimate Area Pipeline Synthesis
65
Pipelined Datapath Synthesis Compiler Loop Bodies 80-90% of Program Execution Time Parallelism Exists Across Multiple Iterations Pipelined Datapath Yields Maximal Throughput. Data Flow Graph Insert Registers & Muxes
66
Pipelined Datapath Synthesis GcGc G1G1 G2G2 G3G3 G4G4
67
VLIW Datapath Synthesis Compiler Non-Loop Computations Instruction-Level Parallelism Similar to Latency-Constrained Scheduling in High-Level Synthesis Data Flow Graph
68
Benchmark Suite MediaBench Benchmark Suite Exp.BenchmarkFile/Function Num. Instrs. Largest Instr. (Operations) Avg. Ops per Instr. 1 2 3 4 5 6 7 8 9 10 11 Mesa PGP Rasta Epic JPEG MPEG2 Rasta blend.c idea.c mul_mdmd_md.c collapse_pyr jpeg_fdct_ifast jpeg_idct_4x4 jpeg_idct_2x2 idct_col FR4TR Lqsolve.c idct_row 6 14 5 7 21 5 8 7 9 4 10 18 8 6 4 9 17 12 5 30 37 25 5.5 3.2 3.0 4.4 7.0 5.9 3.1 7.2 20.0 7.5
69
Experimental Results XilinxE-1000 Area
70
Experimental Results XilinxE-1000 Area
71
Summary Area Estimates Based on Resource Sharing 0-1 Knapsack Problem Formulation Does Allow for Resource Sharing Estimates Resource Sharing Algorithm PBRS applied to Data Flow Graphs Experimental Results ILP Overestimates Area Costs by as much as 374% and 582% for Pipelined and VLIW Datapaths
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.