Download presentation
Presentation is loading. Please wait.
1
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106-9560 {gong, wanggang, kastner}@ece.ucsb.edu http://express.ece.ucsb.edu June 22, 2004
2
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 2 Outline Reconfigurable computing systems Compilation process Synthesizing to hardware Experimental results Concluding remarks
3
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 3 Outline Reconfigurable computing systems Challenges of application representations Compilation process Synthesizing to hardware Experimental results Concluding remarks
4
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 4 Reconfigurable Computing Systems Standard programmable platforms Post-manufacturing customization Designs shift from physical chips to configuration files A software design flow Feature hardware speed with software flexibility Enable higher productivity
5
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 5 Application Representations A common application representation is needed to tame the complexity of system synthesis Requirements Able to generate software code for microprocessors Able to be easily translate to hardware configuration files Allow a variety of transformations and optimizations to exploit the performance
6
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 6 Parallelism Exploration Fine grain parallelism Multiple functional units Issuing an operation to a free functional units Operations executed independently Coarse grain parallelism Executing multiple threads With occasional synchronization Reconfigurable computing systems support both fine and coarse grain parallelism
7
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 7 PDG + SSA The PDG + SSA representation can be used for both hardware synthesis and software generation The PDG and SSA forms are common representations for software generation Here we concentrate on hardware synthesis
8
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 8 Outline Reconfigurable computing systems Compilation process Overview Constructing the PDG Incorporating the SSA form Synthesizing to hardware Experimental results Concluding remarks
9
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 9 Overview
10
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 10 Program Dependence Graph PDG: Program Dependence Graph ENTRY node: the root node of a PDG PREDICATE nodes: producing predicate values from expressions Diamond-shaped nodes 2, 3, and 4 STATEMENTS nodes: a arbitrary set of operations Circle nodes: 1, 4, 6, 7, and 8 REGION nodes: summarizing all operations with the same control conditions together. House-shaped nodes R2, R3, R4 … R3: the predicate value of 2 is True Edges represent dependencies
11
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 11 Constructing the PDG from the CDFG Implemented based on Ferrante’s algorithm Using post-dominate tree var = pred; for (i = 0; i < len; ++i) { val += diff; if (val > 32767) val = 32767; else if (val < -32768) val = -32768; } return val;
12
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 12 Constructing the PDG (cont’d)
13
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 13 The Static Single Assignment Form Each variable has exactly one assignment A variable is referenced always using the same name At joint points of control conditions, special Ø nodes are inserted. val += diff; if (val > 32767) val = 32767; else if (val < -32768) val = -32768; val_2 = val_1 + diff; if (val_2 > 32767) val_3 = 32767; else if (val_2 < -32768) val_4 = -32768; val_5 = phi(val_2,val_3,val_4);
14
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 14 Extending the PDG with Ø-Nodes
15
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 15 The Program Representation Loop independent Ø-nodes taking two or more input values and a predicate value committing one of the inputs depending on this predicate Loop carried Ø-nodes Input: the initial value, the loop- carried value, and also a predicate value Outputs: one to the iteration body, and the other to the loop exit Directing proper values to proper outputs.
16
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 16 Outline Reconfigurable computing systems Compilation process Synthesizing to hardware Data-path elements Ø-nodes Experimental results Concluding remarks
17
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 17 Synthesizing the Data-Path A one-to-one mapping is used Different resource allocation and binding algorithms can be used (on-going work) Each operation has an operator and several operands Operands are synthesized directly to wires in the circuit Each variable in the SSA form has only one definition point PREDICATE nodes: synthesized to Boolean logic signals to control next-stage transitions and direct multiplexers to commit the correct value.
18
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 18 Synthesizing Ø-nodes A loop-independent Ø-nodes are synthesized to a multiplexer. The multiplexer selects input values depending on the predicate values. For a loop carried Ø-node, an additional switch is generated to direct the loop-exiting values
19
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 19 Synthesize to Hardware Simplifications and optimizations Removing unnecessary control dependencies Cascading/ expanding multipliers obtain better performance Flip-flops are inserted Guarantee that correct values will available no matter which execution path is taken
20
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 20 Outline Reconfigurable computing systems Compilation process Synthesizing to hardware Experimental results Setup and benchmarks Results Concluding remarks
21
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 21 Setup and Benchmarks Benchmark suites Functions from the MediaBench suite Profiled using sample data Only report conservative results Estimated execution time Aggressive predicated execution Only report conservative results Area One-to-one mapping without resource sharing Reported in numbers of FPGA slices
22
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 22 Estimated Execution Time
23
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 23 Estimated Execution Time (cont’d)
24
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 24 Estimated FPGA Area
25
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 25 Outline Reconfigurable computing systems Compilation process Synthesizing to hardware Experimental results Concluding remarks On-going/future work
26
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 26 Concluding Remarks The PDG+SSA form supports a variety of transformations and enables both coarse and fine grain parallelism A method to synthesize this form to hardware This form gives faster execution time using similar area when compared with CFG and PSSA forms
27
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 27 On-going/Future work Investigate transformations to create coarse grained parallelism using the PDG+SSA form Augment the PDG+SSA form with architectural information to provide fast estimation. Integrate of resource sharing and other architectural synthesis techniques
28
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 28 Thank You Prof Ryan Kastner and Gang Wang All audiences
29
6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 29 Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.