FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.

Slides:

Advertisements

Similar presentations

A Flow Graph Technique for DFT Controller Modification

Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

Modern VLSI Design 2e: Chapter 8 Copyright  1998 Prentice Hall PTR Topics n High-level synthesis. n Architectures for low power. n Testability and architecture.

High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department

1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.

Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.

 Based on the resource constraints a lower bound on the iteration interval is estimated  Synthesis targeting reconfigurable logic (e.g. FPGA) faces the.

1/20 Data Communication Estimation and Reduction for Reconfigurable Systems Adam Kaplan Philip Brisk Ryan Kastner Computer Science Elec. and Computer Engineering.

Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.

Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.

Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.

Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.

Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,

Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.

VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.

DAC 2001: Paper 18.2 Center for Embedded Computer Systems, UC Irvine Center for Embedded Computer Systems University of California, Irvine

Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.

EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.

Petros OikonomakosBashir M. Al-Hashimi Mark Zwolinski Versatile High-Level Synthesis of Self-Checking Datapaths Using an On-line Testability Metric Electronics.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNES: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.

CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.

Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.

CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {

Network Aware Resource Allocation in Distributed Clouds.

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.

Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.

Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.

Electrical and Computer Engineering Muhammad Noman Ashraf Optimization of Data-Flow Computations Using Canonical TED Representation M. Ciesielski, D. Gomez-Prado,Q.

Implementation of Finite Field Inversion

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.

A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.

L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수

Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.

ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.

1. Placement of Digital Microfluidic Biochips Using the T-tree Formulation Ping-Hung Yuh 1, Chia-Lin Yang 1, and Yao-Wen Chang 2 1 Dept. of Computer Science.

1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.

L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수

OPTIMIZING DSP SCHEDULING VIA ADDRESS ASSIGNMENT WITH ARRAY AND LOOP TRANSFORMATION Chun Xue, Zili Shao, Ying Chen, Edwin H.-M. Sha Department of Computer.

A High-Level Synthesis Flow for Custom Instruction Set Extensions for Application-Specific Processors Asia and South Pacific Design Automation Conference.

Wajid Minhass, Paul Pop, Jan Madsen Technical University of Denmark

University of Michigan Electrical Engineering and Computer Science 1 Compiler-directed Synthesis of Multifunction Loop Accelerators Kevin Fan, Manjunath.

HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian Pilato Matr.n Relatore: Prof.

CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.

Test complexity of TED operations Use canonical property of TED for - Software Verification - Algorithm Equivalence check - High Level Synthesis M ac iej.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 15: March 13, 2013 High Level Synthesis II Dataflow Graph Sharing.

ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.

Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.

Global Delay Optimization using Structural Choices Alan Mishchenko Robert Brayton UC Berkeley Stephen Jang Xilinx Inc.

Contents Introduction Bus Power Model Related Works Motivation

Architecture and Synthesis for Multi-Cycle Communication

A New Logic Synthesis, ExorBDS

Ph.D. in Computer Science

Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke

Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.

ESE535: Electronic Design Automation

Michele Santoro: Further Improvements in Interconnect-Driven High-Level Synthesis of DFGs Using 2-Level Graph Isomorphism Michele.

ESE535: Electronic Design Automation

Presentation transcript:

FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department of Electrical and Computer Engineering University of Massachusetts, Amherst DATE13

Outline INTRODUCTION FPGA DESIGN LATENCY OPTIMIZATION DURING HIGH LEVEL SYNTHESIS REVIEW OF TAYLOR EXPANSION DIAGRAMS SYSTEM LEVEL EXPLORATION EXPERIMENTAL RESULTS CONCLUSIONS

INTRODUCTION Much FPGA design exploration work has focused on optimizations which are performed during traditional high- level synthesis (HLS), such as operation scheduling, register binding, and functional unit allocation. In this paper we show that it is possible to rapidly explore and modify dataflow graph representations at the behavioral level to improve FPGA design latency.

INTRODUCTION New design flow consists of the following phases: – 1) High-level algorithmic transformation using TEDs. – 2) High-level synthesis using GAUT [3]. – 3) RTL design synthesis and physical design (place and route) using Altera Quartus II for commercial Stratix II FPGA devices. We demonstrate that TED transformations can directly lead to an average of 22.6% post-mapped improvement in design latency.

FPGA DESIGN LATENCY OPTIMIZATION DURING HIGH LEVEL SYNTHESIS In Chen and Cong [1],a register binding algorithm with multiplexer optimization is modeled as a minimum cost flow in a network, and then a greedy algorithm is used to optimize performance. The problem of register binding for clock period minimization without register overhead is formulated in Huang and Chen [4].

FPGA DESIGN LATENCY OPTIMIZATION DURING HIGH LEVEL SYNTHESIS Cong et al. [5], the overall resource usage of functional units, registers and multiplexers is simultaneously optimized. The scheduler transmits global optimization information between each step of the algorithm. In Kim and Liu [6], a simultaneous register and functional unit binding algorithm targeting multiplexer input reduction to shorten the total interconnect length is developed.

FPGA DESIGN LATENCY OPTIMIZATION DURING HIGH LEVEL SYNTHESIS The low-power architectural synthesis system (LOPASS) [7] performs a simulated-annealing optimization over the entire synthesis process to effectively reduce power. Our method is complementary to the previous approaches as it restructures a dataflow graph prior to high level synthesis in an attempt to create a DFG which minimizes design latency.

REVIEW OF TAYLOR EXPANSION DIAGRAMS A TED is a canonical, graph-based data structure that can efficiently represent designs expressed as multivariate polynomial expressions.

REVIEW OF TAYLOR EXPANSION DIAGRAMS DFG Generation: – The DFG representation is not unique. While the number of operations remains fixed, a DFG can be further restructured and/or balanced to minimize latency. DFG Selection and Optimization: – DFG does not minimize the number of hardware resources. Such an optimization is only possible by performing scheduling and resource allocation for the selected DFG.

REVIEW OF TAYLOR EXPANSION DIAGRAMS TED-based Decomposition System (TDS): – TDS [8], which transforms the function extracted from a design specification into a TED and uses a host of TED- based decomposition and DFG optimization techniques to obtain an optimized DFG.

REVIEW OF TAYLOR EXPANSION DIAGRAMS

SYSTEM LEVEL EXPLORATION Estimating resource usage within the TED:

SYSTEM LEVEL EXPLORATION Computing the critical path delay from a TED: – The critical path delay algorithm traverses all nodes of the TED and computes the height. When an edge connecting a node to its children contains a register, the height contributed by that child is assigned 0 as if it were a primary input.

SYSTEM LEVEL EXPLORATION Iterative high level synthes is: – The annealing algorithm accepts and rejects the new costs which calibrates the window size into which a TED performs reordering and decomposition.

SYSTEM LEVEL EXPLORATION

Design latency (clock period × number of clock cycles)

EXPERIMENTAL RESULTS Best result is output by GAUT in VHDL and run through Altera Quartus synthesis, place and route to validate the improvements reported by GAUT.

EXPERIMENTAL RESULTS

CONCLUSIONS A new FPGA design latency optimization tool has been demonstrated. For a collection of benchmark designs our approach shows a 22% reduction in design latency.