A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

Slides:

Advertisements

Similar presentations

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

Advertisements

Overview Structural Testing Introduction – General Concepts

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.

Programmability Issues

Breaking SIMD Shackles with an Exposed Flexible Microarchitecture and the Access Execute PDG Venkatraman Govindaraju, Tony Nowatzki, Karthikeyan Sankaralingam.

Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.

High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.

Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.

Program Representations. Representing programs Goals.

SSP Re-hosting System Development: CLBM Overview and Module Recognition SSP Team Department of ECE Stevens Institute of Technology Presented by Hongbing.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

Optimizing high speed arithmetic circuits using three-term extraction Anup Hosangadi Ryan Kastner Farzan Fallah ECE Department Fujitsu Laboratories University.

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

Physically Aware Data Communication Optimization for Hardware Synthesis Ryan Kastner, Wenrui Gong, Xin Hao, Forrest Brewer Dept. of Electrical and Computer.

Cpeg421-08S/final-review1 Course Review Tom St. John.

ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.

1/20 Data Communication Estimation and Reduction for Reconfigurable Systems Adam Kaplan Philip Brisk Ryan Kastner Computer Science Elec. and Computer Engineering.

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.

Reconfigurable Computing (EN2911X, Fall07)

Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

1 CS 201 Compiler Construction Lecture 6 Code Optimizations: Constant Propagation & Folding.

Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.

5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 JHDL Hardware Generation Mike Wirthlin and Matthew Koecher

Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.

Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...

Data Flow Analysis Compiler Design Nov. 8, 2005.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.

Precision Going back to constant prop, in what cases would we lose precision?

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.

ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.

Compositional correctness of IP-based system design: Translating C/C++ Models into SIGNAL Processes Rennes, November 04, 2005 Hamoudi Kalla and Jean-Pierre.

Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

1 A Static Analysis Approach for Automatically Generating Test Cases for Web Applications Presented by: Beverly Leung Fahim Rahman.

Automated Design of Custom Architecture Tulika Mitra

Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

J. Christiansen, CERN - EP/MIC

Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari

Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.

1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.

Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.

“Software” Esterel Execution (work in progress) Dumitru POTOP-BUTUCARU Ecole des Mines de Paris

L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수

Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.

Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.

Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.

1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)

Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.

1 Software Testing & Quality Assurance Lecture 13 Created by: Paulo Alencar Modified by: Frank Xu.

Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.

CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.

Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip BriskAdam KaplanMajid Sarrafzadeh Embedded and Reconfigurable.

CS223: Software Engineering Lecture 26: Software Testing.

1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.

Ph.D. in Computer Science

Optimization Code Optimization ©SoftMoore Consulting.

Anne Pratoomtong ECE734, Spring2002

Introduction to cosynthesis Rabi Mahapatra CSCE617

Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.

Presentation transcript:

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering University of California Santa Barbara, CA {gong, wanggang, June 22, 2004

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 2 Outline  Reconfigurable computing systems  Compilation process  Synthesizing to hardware  Experimental results  Concluding remarks

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 3 Outline  Reconfigurable computing systems  Challenges of application representations  Compilation process  Synthesizing to hardware  Experimental results  Concluding remarks

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 4 Reconfigurable Computing Systems  Standard programmable platforms  Post-manufacturing customization  Designs shift from physical chips to configuration files  A software design flow  Feature hardware speed with software flexibility  Enable higher productivity

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 5 Application Representations  A common application representation is needed to tame the complexity of system synthesis  Requirements  Able to generate software code for microprocessors  Able to be easily translate to hardware configuration files  Allow a variety of transformations and optimizations to exploit the performance

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 6 Parallelism Exploration  Fine grain parallelism  Multiple functional units  Issuing an operation to a free functional units  Operations executed independently  Coarse grain parallelism  Executing multiple threads  With occasional synchronization  Reconfigurable computing systems support both fine and coarse grain parallelism

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 7 PDG + SSA  The PDG + SSA representation can be used for both hardware synthesis and software generation  The PDG and SSA forms are common representations for software generation  Here we concentrate on hardware synthesis

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 8 Outline  Reconfigurable computing systems  Compilation process  Overview  Constructing the PDG  Incorporating the SSA form  Synthesizing to hardware  Experimental results  Concluding remarks

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 9 Overview

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 10 Program Dependence Graph  PDG: Program Dependence Graph  ENTRY node: the root node of a PDG  PREDICATE nodes: producing predicate values from expressions  Diamond-shaped nodes 2, 3, and 4  STATEMENTS nodes: a arbitrary set of operations  Circle nodes: 1, 4, 6, 7, and 8  REGION nodes: summarizing all operations with the same control conditions together.  House-shaped nodes R2, R3, R4 …  R3: the predicate value of 2 is True  Edges represent dependencies

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 11 Constructing the PDG from the CDFG  Implemented based on Ferrante’s algorithm  Using post-dominate tree var = pred; for (i = 0; i < len; ++i) { val += diff; if (val > 32767) val = 32767; else if (val < ) val = ; } return val;

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 12 Constructing the PDG (cont’d)

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 13 The Static Single Assignment Form  Each variable has exactly one assignment  A variable is referenced always using the same name  At joint points of control conditions, special Ø nodes are inserted. val += diff; if (val > 32767) val = 32767; else if (val < ) val = ; val_2 = val_1 + diff; if (val_2 > 32767) val_3 = 32767; else if (val_2 < ) val_4 = ; val_5 = phi(val_2,val_3,val_4);

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 14 Extending the PDG with Ø-Nodes

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 15 The Program Representation  Loop independent Ø-nodes  taking two or more input values and a predicate value  committing one of the inputs depending on this predicate  Loop carried Ø-nodes  Input: the initial value, the loop- carried value, and also a predicate value  Outputs: one to the iteration body, and the other to the loop exit  Directing proper values to proper outputs.

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 16 Outline  Reconfigurable computing systems  Compilation process  Synthesizing to hardware  Data-path elements  Ø-nodes  Experimental results  Concluding remarks

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 17 Synthesizing the Data-Path  A one-to-one mapping is used  Different resource allocation and binding algorithms can be used (on-going work)  Each operation has an operator and several operands  Operands are synthesized directly to wires in the circuit  Each variable in the SSA form has only one definition point  PREDICATE nodes: synthesized to Boolean logic signals to control next-stage transitions and direct multiplexers to commit the correct value.

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 18 Synthesizing Ø-nodes  A loop-independent Ø-nodes are synthesized to a multiplexer. The multiplexer selects input values depending on the predicate values.  For a loop carried Ø-node, an additional switch is generated to direct the loop-exiting values

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 19 Synthesize to Hardware  Simplifications and optimizations  Removing unnecessary control dependencies  Cascading/ expanding multipliers obtain better performance  Flip-flops are inserted  Guarantee that correct values will available no matter which execution path is taken

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 20 Outline  Reconfigurable computing systems  Compilation process  Synthesizing to hardware  Experimental results  Setup and benchmarks  Results  Concluding remarks

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 21 Setup and Benchmarks  Benchmark suites  Functions from the MediaBench suite  Profiled using sample data  Only report conservative results  Estimated execution time  Aggressive predicated execution  Only report conservative results  Area  One-to-one mapping without resource sharing  Reported in numbers of FPGA slices

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 22 Estimated Execution Time

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 23 Estimated Execution Time (cont’d)

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 24 Estimated FPGA Area

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 25 Outline  Reconfigurable computing systems  Compilation process  Synthesizing to hardware  Experimental results  Concluding remarks  On-going/future work

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 26 Concluding Remarks  The PDG+SSA form supports a variety of transformations and enables both coarse and fine grain parallelism  A method to synthesize this form to hardware  This form gives faster execution time using similar area when compared with CFG and PSSA forms

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 27 On-going/Future work  Investigate transformations to create coarse grained parallelism using the PDG+SSA form  Augment the PDG+SSA form with architectural information to provide fast estimation.  Integrate of resource sharing and other architectural synthesis techniques

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 28 Thank You  Prof Ryan Kastner and Gang Wang  All audiences

6/21/2004 GONG et al: A High Performance Application Representation for Reconfigurable Systems 29 Questions