Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Embedded Parallel Systems Based on Dynamic Look-Ahead Reconfiguration in Redundant Systems Stephen Holmes.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Context Switch in Reconfigurable System Sun, Yuan-Ling ESL of CSIE, CCU
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
1 Oct 2, 2003 Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems Traian Pop, Petru Eles, Zebo Peng Embedded Systems Laboratory.
1 of 14 1 Analysis and Synthesis of Communication-Intensive Heterogeneous Real-Time Systems Paul Pop Computer and Information Science Dept. Linköpings.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Hardware-Software Codesign Elvira Kitsis Hermawan Ho Alex Papadimoulis.
1 of 14 1 / 18 An Approach to Incremental Design of Distributed Embedded Systems Paul Pop, Petru Eles, Traian Pop, Zebo Peng Department of Computer and.
Torino (Italy) – June 25th, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems Christian Pilato Fabrizio Ferrandi,
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Tabu Search-Based Synthesis of Dynamically Reconfigurable Digital Microfluidic Biochips Elena Maftei, Paul Pop, Jan Madsen Technical University of Denmark.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.
Orchestration by Approximation Mapping Stream Programs onto Multicore Architectures S. M. Farhad (University of Sydney) Joint work with Yousun Ko Bernd.
Operating Systems for Reconfigurable Systems John Huisman ID:
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.
FPGA FPGA2  A heterogeneous network of workstations (NOW)  FPGAs are expensive, available on some hosts but not others  NOW provide coarse- grained.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Static Translation of Stream Programs S. M. Farhad School of Information Technology The University of Sydney.
Energy Aware Task Mapping Algorithm For Heterogeneous MPSoC Based Architectures Amr M. A. Hussien¹, Ahmed M. Eltawil¹, Rahul Amin 2 and Jim Martin 2 ¹Wireless.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.
VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Optimisation of Pipelined MPSoCs Using Integer Linear Programming Embedded Systems Laboratory Haris Javaid Research Associate.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
Runtime Temporal Partitioning Assembly to Reduce FPGA Reconfiguration Time Abelardo Jara-Berrocal, Ann Gordon-Ross HCS Research Laboratory College of Engineering.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Programmable Hardware: Hardware or Software?
Dynamo: A Runtime Codesign Environment
FPGA: Real needs and limits
Dynamically Reconfigurable Architectures: An Overview
Presentation transcript:

Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil Dutt, University of California, Irvine Presented by Abhijeet Lawande

2 Outline  Introduction  Target System Architecture  Placement Issues  Proposed Approach  Placement Example  Experimental Results  Conclusions

3 Introduction  Hardware / Software Partitioning  Used in systems with reconfigurable hardware (FPGA) operated in conjunction with a software processor  Hardware and Software tasks can execute concurrently  Partitioning divides task graph into HW executed and SW executed tasks to reduce time to completion

4 Introduction  Partial Reconfiguration  ‘Columns’ of FPGA can be configured independently  Hardware mapped to other columns continues to run during reconfiguration  Partial Dynamic Reconfiguration  Allows reuse of FPGA resources  However, feasibility of placement no longer guaranteed

5 Target System Architecture  Software: A processor running software tasks  Hardware: An FPGA accelerator that supports partial reconfiguration  Shared Memory: Dedicated memory used to transfer input/output data between tasks General Purpose Memory Software Hardware (Partial RTR) Shared Memory

6 Target System Architecture  Shared Memory can be implemented as on-chip or off- chip dedicated memory  Tasks mapped to the same device have negligible communication overhead  Tasks mapped to different devices incur a HW/SW communication overhead  Primary advantage: FPGA task placement reduces to simple linear placement

7 Criticality of Task Placement  Each HW task occupies one or more adjacent FPGA columns  Placement feasibility in not guaranteed even with an exact algorithm  Infeasible implementation can result from scheduling conflicts if not considered during placement

8 Criticality of Task Placement Infeasible Task Graph

9 Criticality of Task Placement Feasible Task Graph

10 Criticality of Task Placement Infeasible placement

11 Heterogeneous Implementations  FPGA contain heterogeneous components:  Memory Blocks  Hardware Multipliers  Embedded Processors  Placement should consider multiple hardware implementations of tasks  Problem: Resources are limited and available in specific locations on FPGA

12 Configuration Prefetch  Reconfiguration can take place as other HW tasks execute  Prefetch of configuration data should be considered while scheduling tasks

13 Proposed Approach  Exact Algorithm: Integer Linear Programming  Technique of Optimization given linear constraints  Constraints: Traditional HW/SW partitioning + Contiguous placement + Configuration Prefetch  Implementation on commercial ILP solver (CPLEX) very slow  Heuristic Formulation:  Modified KLFM approach

14 Basic KLFM Heuristic KLFM Loop: While (more unlocked tasks) select best task to switch between HW/SW move & lock best task update best partition if new partition is better

15 Basic KLFM Heuristic KLFM Loop: While (more unlocked tasks) for (each unlocked task) for (each alternate implementation) calculate makespan by physically aware list scheduling select & lock best (task, implementation point) update best partition if new partition is better

16 Placement Example Time C1 C2C5 C4 C3 C6 Proc E1E1 E2E2 R3R3 R4R4 E3E3 E4E4 R5R5 E5E5 P6P6 C 65 Gap T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 TaskHW time SW timeHW area

17 Experiments on Feasibility Placement-unawarePlacement-aware TestT ILP FeasibilityT ILP T HEU tg110Y 11 tg525NO26 Mean-value21Y tg720Y tg1027NO2829 FFT25Y tg1136NO3841 tg1214NO band eq27Y

18 Case Study: JPEG Encoder  Resource constraint of 8 columns  Total area occupied by tasks: 11 columns  Data collected for a 256x256 color image ExperimentSchedule length (ms) HW-SW partitioning, no partial RTR16.74 HW-SW, partial RTR9.9 HW-SW, partial RTR, perfect prefetch9.04 Finer-grain graph7.21 Multiple implementations, single heterogeneous column 6.82 Best implementation points only9.58

19 Conclusions  Current techniques do not consider one or more placement and scheduling issues:  Configuration prefetch  Feasibility of partition  Single reconfiguration controller bottleneck  Multiple Implementations  Heterogeneous Architecture  Integer Linear Programming: Exact solution, but very long run-time  Modified KLFM Heuristic: Almost ideal solution, run-time of minutes of hundreds of nodes

20 References Physically-aware hw-sw partitioning for reconfigurable architectures with partial dynamic reconfiguration (2005) Physically-aware hw-sw partitioning for reconfigurable architectures with partial dynamic reconfiguration by S Banerjee, E Bozorgzadeh, N Dutt In DAC ’05: Proceedings of the 42nd annual conference on Design automation Hardware/software partitioning using Integer Programming (1996) Hardware/software partitioning using Integer Programming by R Niemann, P Marwedel In Proc. ED&TC 2.ppt 2.ppt

21 Extras

22 Issues in Placement  Resource bottleneck of a single reconfiguration controller  May not be possible to hide reconfiguration overhead for all tasks  Cannot apply rectangular packing algorithms due to gaps in schedule (caused by dependencies)

23 EST Computation Algorithm find earliest time slot where task can be placed reconfig start = earliest time instant that space and controller are available together if (( reconfig start + reconfig time) < dependency time ) EST=earliest time parent dependencies are satisfied else EST=end of reconfiguration