Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore.

Slides:



Advertisements
Similar presentations
Simulation of Feedback Scheduling Dan Henriksson, Anton Cervin and Karl-Erik Årzén Department of Automatic Control.
Advertisements

Load Balancing Parallel Applications on Heterogeneous Platforms.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
1 Material to Cover  relationship between different types of models  incorrect to round real to integer variables  logical relationship: site selection.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Undoing the Task: Moving Timing Analysis back to Functional Models Marco Di Natale, Haibo Zeng Scuola Superiore S. Anna – Pisa, Italy McGill University.
RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor Paul Regnier † George Lima † Ernesto Massa † Greg Levin ‡ Scott Brandt ‡
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
ISPDC 2007, Hagenberg, Austria, 5-8 July On Grid-based Matrix Partitioning for Networks of Heterogeneous Processors Alexey Lastovetsky School of.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Computer Algorithms Integer Programming ECE 665 Professor Maciej Ciesielski By DFG.
On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
Misconceptions About Real-time Computing : A Serious Problem for Next-generation Systems J. A. Stankovic, Misconceptions about Real-Time Computing: A Serious.
5.6 Maximization and Minimization with Mixed Problem Constraints
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Optimization of Linear Problems: Linear Programming (LP) © 2011 Daniel Kirschen and University of Washington 1.
A tale of 2-dimensional (Guillotine) bin packing Nikhil Bansal (IBM) Andrea Lodi (Univ. of Bologna, Italy) Maxim Sviridenko (IBM)
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
PCI-Express Network Sniffer Characterization Presentation Project Period : 2 semesters Students: Neria Wodage Aviel Tubul Advisor: Mony Orbach 17/12/2007.
Study of AES Encryption/Decription Optimizations Nathan Windels.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
June 10, Functionally Linear Decomposition and Synthesis of Logic Circuits for FPGAs Tomasz S. Czajkowski and Stephen D. Brown University of Toronto.
JOP: A Java Optimized Processor for Embedded Real-Time Systems Martin Schöberl.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Parallelization: Area Under a Curve. AUC: An important task in science Neuroscience – Endocrine levels in the body over time Economics – Discounting:
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
ECE 556 Linear Programming Ting-Yuan Wang Electrical and Computer Engineering University of Wisconsin-Madison March
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Configurable, reconfigurable, and run-time reconfigurable computing.
Lithographic Aerial Image Simulation with FPGA based Hardware Acceleration Jason Cong and Yi Zou UCLA Computer Science Department.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
CSCI1600: Embedded and Real Time Software Lecture 24: Real Time Scheduling II Steven Reiss, Fall 2015.
Written by Changhyun, SON Chapter 5. Introduction to Design Optimization - 1 PART II Design Optimization.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Irregular stock cutting with guillotine cuts Han Wei, Julia Bennell NanJing,China.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Part 3. Linear Programming 3.2 Algorithm. General Formulation Convex function Convex region.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
From the customer’s perspective the SRS is: How smart people are going to solve the problem that was stated in the System Spec. A “contract”, more or less.
Undergraduate course on Real-time Systems Linköping University TDDD07 Real-time Systems Lecture 2: Scheduling II Simin Nadjm-Tehrani Real-time Systems.
Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Lecture 6: Real-Time Scheduling
Prepared by Oussama Jebbar
Solving Linear Program by Simplex Method The Concept
Software Engineering (CSI 321)
Introduction to cosynthesis Rabi Mahapatra CSCE617
Part 3. Linear Programming
Sanjoy Baruah The University of North Carolina at Chapel Hill
CSCI1600: Embedded and Real Time Software
CSCI1600: Embedded and Real Time Software
Lecture 18. SVM (II): Non-separable Cases
Part 3. Linear Programming
Presentation transcript:

Partizionamento HW/SW nell'implementazione di sistemi real-time su FPGA con softcore

Outline Intro & Motivation Model Algorithms Experiments

Intro and Motivation Past work on design optimization for single- processor scheduling –Realizing that the schedulability condition can be viewed as a feasibility region in the domain of the design variables –Realizing that such region is convex for EDF under reasonable assumptions Availability of Softcores for FPGAs –NIOS II for Altera Co-design problem –a functionality can be implemented in HW (inside the FPGA) in SW (inside or outside the FPGA) and executed by one or more (How many?) Softcores.

Motivation Start from some system Model (Simulink) Explore different HW design options ( … NIOS) For each design option find optimal design configuration by means of convex linear optimization HW implementation is subject to area constraints SW implementation is subject to schedulability constraints

HW (area) Constraints Models available: Single-dimension Condition linear bound slottedlinear

HW (area) Constraints Models available: –2-dimensions cutting stock problem Complex, more realistic and extremely well- studied problem (real-world implications) linear bound solutions can be found from operations research literature !

Reality of FPGAs (additional resource constr.)

Schedulability constraints EDF (or L&L sufficient) bound How realistic is it? Implementations of FP and EDF on NIOS exist How about deadline=periods, independence and so on?

The Model Starting point: Simulink model

The Model implementation of a Simulink model HW implementation: market tools exist (Celoxica) for implementing Simulink blocks in FPGA.

The Model SW implementation: market tools exist RT- Workshop+embedded coder (Mathworks) or TergetLink (Dspace) for implementing Simulink blocks as a set of concurrent threads. Threads inherit the sampling period of the blocks (periodic model) No overrun is permitted (deadlines=periods) Communication is by switched buffers (asynchronous, tasks are independent) Of course code generation and switched buffers are not commercially available for EDF but there is nothing that prevents their implementation

The Model FPGA = rectangular area of Logic Elements (Les). All dimensions will be in terms of Les FPGA height = H FPGA width = W Assume homogeneous bidimensional model of FPGA (array of Les) k Softcores CPU l l=1..k are implemented in FPGA: each core requires an area slsh (k=0, 1, 2..) H W sh sw

The Model System model = network of blocks V = {F 1, F 2, … F n } is the set of functional block A block F i can be implemented in HW or SW. according to the value of s il  {0,1}. s il =1 if block F i is executed in SW upon CPU l. If not executed in SW a block MUST be implemented in HW. If implemented in HW, a block requires an area w i  h i If implemented in SW, a block F i has a worst case comp. time  i and a period of execution t i. (HW implementation has  i  0) u i =  i /t i

The Model If implemented in SW, a block is executed in the context of a thread with the same period. m i,j =1 if F i is mapped for execution in  j and 0 otherwise (these are not optimization variables but constants!) Schedulability constraint (for each NIOS)

Results to be exploited Cutting Stock approximate (linear) solution: Level packing (Lodi) pack the items in row forming levels –the first level is the bottom of the bin, the second level is built on top of the first and so on … In each level, the leftmost item is the tallest one The bottom level is the tallest one Items are sorted and renunmbered by non- increasing h i values.

Results to be exploited An example: there are n potential levels (one for each initializing block)

Results to be exploited Variables: y i = 1 if item i initializes level i and 0 otherwise Objective (original): –minimize the height of the required rectangle

Results to be exploited Constraints (original): –x ij, i {1.. n-1}, j>i, x ij =1 if item j is packed in level i, 0 otherwise Each item is packed exactly once Width constraint

Reusing Results These results can be reused as follows: The original objective can be retained or it can become a constraint

Results to be exploited The existence of a packing (Each item is packed exactly once) Becomes … Each item is packed exactly once or it is executed on a CPU

Results to be exploited The width constraint is retained … A schedulability constraint must be added for eack CPU Options: Minimize height with the utilization constraint Minimize utilization with height constraint

Problem The available area is not squared! The area necessary for implementing the k CPUs must be considered Solution: start with the 1-CPU case: there are two possible partitionings H W sh sw H-sh W-sw Duplicate all packing variables (the complexity of the problem is correspondingly increased)

Problem For the k-CPU case additional assumptions are required (CPUs are packed by rows, columns, or …) H W sh sw H - k sh W - k sw H W H - 2 sh W - 2 sw

Experimenting with GPLK Demo …