Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
S. Reda EN2911X FALL’07 Reconfigurable Computing (EN2911X) Lecture 01: Introduction Prof. Sherief Reda Division of Engineering, Brown University Spring.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Introductory Comments Regarding Hardware Description Languages.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 17: Application-Driven Hardware Acceleration (3/4)
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 09: RC Principles: Software (2/4) Prof. Sherief Reda.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 15: SystemC (3/3) Prof. Sherief Reda Division of.
Reconfigurable Computing (EN2911X, Fall07)
Evolution of implementation technologies
Altera’s Quartus II Installation, usage and tutorials Gopi Tummala Lab/Office Hours : Friday 2:00 PM to.
12/1/2005Comp 120 Fall December Three Classes to Go! Questions? Multiprocessors and Parallel Computers –Slides stolen from Leonard McMillan.
5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 JHDL Hardware Generation Mike Wirthlin and Matthew Koecher
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
RUN-TIME RECONFIGURATION FOR AUTOMATIC HARDWARE/SOFTWARE PARTITIONING Tom Davidson, Karel Bruneel, Dirk Stroobandt Ghent University, Belgium Presenting:
1  Staunstrup and Wolf Ed. “Hardware Software codesign: principles and practice”, Kluwer Publication, 1997  Gajski, Vahid, Narayan and Gong, “Specification,
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Automated Design of Custom Architecture Tulika Mitra
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW.
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
FPGA CAD 10-MAR-2003.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
ASIC/FPGA design flow. Design Flow Detailed Design Detailed Design Ideas Design Ideas Device Programming Device Programming Timing Simulation Timing Simulation.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
Reconfigurable Computing1 Reconfigurable Computing Part II.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
ASIC Design Methodology
Dynamo: A Runtime Codesign Environment
Instructor: Dr. Phillip Jones
Introduction to cosynthesis Rabi Mahapatra CSCE617
Reconfigurable Computing
Modeling Languages and Abstract Models
Architecture Synthesis
Reconfigurable Computing (EN2911X)
Reconfigurable Computing (EN2911X, Fall07)
Reconfigurable Computing (EN2911X)
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda Division of Engineering, Brown University

Reconfigurable Computing S. Reda, Brown University Summary of current status Past lectures Understood the principles of the hardware part of reconfigurable computing: programmable logic technology. Learned how to program reconfigurable fabrics using hardware definition languages (Verilog). Next lectures Understand the principles of the software part (which we have partly used) of reconfigurable computing. Learn how to program reconfigurable fabrics using system software languages (SystemC).

Reconfigurable Computing S. Reda, Brown University Reconfigurable computing design flow partitioning SW System Specification HW compiling Verilog synthesis mapping place & route download to board compile link configuration file executable image so far we only experienced this portion

Reconfigurable Computing S. Reda, Brown University System specification Use High-Level Languages (HLLs) (C, C++, Java, MATLAB). Advantages:  Since systems consist of both SW and HW, then we can describe the entire system with the same specification  Fast to code, debug and verify the system is working Disadvantages:  No concurrent support  No notion of time (clock or delay)  Different communication model than HW (uses signals)  Missing data types (e.g., bit vectors, logic values)  How can we overcome these disadvantages?

Reconfigurable Computing S. Reda, Brown University Using HLL for hardware/software specification Augment the HLL (e.g. C++) with a new library that support additional hardware-like functionality (e.g. SystemC) –Unified language across all stages of platform design –Fast simulation –There are already lots of tools for C++ → we will come to this part later in details Enable compilers to optimize code and extract concurrency from sequential code to map into FPGAs [from G. De Micheli]

Reconfigurable Computing S. Reda, Brown University Hardware-Software partitioning Given a system specification, decompose or partition the specification into tasks (functional objects) and label each task as HW or SW such that the system cost / performance is optimized and all the constraints on resources / cost are satisfied. The exact performance depends on the computational model in hand –Given the same application, a system with an FPGA on a slow bus results in a model with different performance parameter than a system with a FPGA as a coprocessor.

Reconfigurable Computing S. Reda, Brown University HW/SW partitioning model task int main() { …... } SW HW SW Good partitioning criteria: 1.Minimize communication (traffic) between HW and SW and on the bus 2.Maximize concurrency (reduce stalling) where both the HW and SW run in parallel 3.Maximizes the utilization of the HW resources → Minimize total execution runtime

Reconfigurable Computing S. Reda, Brown University Profiling is a key step in HW/SW partitioning Determining the candidate HW partitions by first profiling the specification tasks taking into account typical data sets Given a candidate SW/HW partition  Estimate HW implementation  Determine the system performance and speedup over software  How can we generate candidate SW/HW partitions?

Reconfigurable Computing S. Reda, Brown University HW/SW partitioning algorithms Total size is constrained by number and size of available FPGA(s) SW tasks HW tasks task Execution time moves local optimal global optimal Kernighan/Lin – Fidducia/Mattheyses algorithm Start with all task vertices free to swap/move (unlocked) Label each possible swap/move with immediate change in execution time that it causes (gain) Iteratively select and execute a swap/move with highest gain (whether positive or negative); lock the moving vertex (i.e., cannot move again during the pass), Best solution seen during the pass is adopted as starting solution for next pass

Reconfigurable Computing S. Reda, Brown University Low-level partitioning from software binaries Rather than partition from the high-level description, it is possible to compile the program as SW and then partition the resultant executable binary into SW and HW parts. –Advantages: No need to worry about which language is being used Can be used to develop dynamic runtime partitioners and synthesizers –Main steps: Decompilation of binary to recover high-level information Partitioning and synthesis Binary updating to account for the SW parts that migrated to HW

Reconfigurable Computing S. Reda, Brown University Compilation Reconfigurable configurable has the ability to execute multiple operations in parallel through spatial distribution of the computing resources When compiling a SW-based sequential language like (C) into a concurrent language like Verilog, it is necessary to either –Manually instruct the compiler to incorporate parallelism either through special instructions or compiler directives –Automatically through the compiler How can the compiler automatically extract parallelism?

Reconfigurable Computing S. Reda, Brown University Data-flow graphs (DFG) A data-flow graph (DFG) is a graph which represents a data dependencies between a number of operations. Dependencies arise from a various reasons –An input to an operation can be the output of another operation –Serialization constraints, e.g., loading data on a bus and then raising a flag –Sharing of resources A dataflow graph represents operations and data dependencies –Vertex set is one-to-one mapping with tasks –A directed edge is in correspondence with the transfer of data from an operation to another one + a b c

Reconfigurable Computing S. Reda, Brown University Consider the following example [Giovanni’94] Design a circuit to numerically solve the following differential equation in the interval [0, a] with step-size dx read (x, y, u, dx, a); do { xl = x + dx; ul = u – (3*x*u*dx) – (3*y*dx); yl = y + u*dx; c = xl < a; x = x1; u = u; y = yl; } while (c); write(y);

Reconfigurable Computing S. Reda, Brown University Data-flow graph example xl = x + dx; ul = u – (3*x*u*dx) – (3*y*dx); yl = y + u*dx; c = xl < a; ** * - * * - * + 3xudx3yu x + < a xl y dx yl c u u1

Reconfigurable Computing S. Reda, Brown University Detecting concurrency from DFGs ** * * - * * - * + + < Paths in the graph represent concurrent streams of operations Extended DFG where vertices can represent links to link graph DFGs in a hierarchy of graphs NOP

Reconfigurable Computing S. Reda, Brown University Control / data-flow graphs (CDFG) Control-flow information (branching and iteration) can be also represented graphically Data-flow graphs can be extended by introducing branching vertices that represent operations that evaluate conditional clauses Iteration can be modeled as a branch based on the iteration exit condition Vertices can also represent model calls

Reconfigurable Computing S. Reda, Brown University CDFG example x = a * b; y = x * c; z = a + b; if (z ≥ 0) { p = m + n; q = m * n; } NOP * * BR + NOP + *

Reconfigurable Computing S. Reda, Brown University Next lecture Parallelism extraction and optimization from DFG