Politecnico di Milano, Italy

Name: Politecnico di Milano, Italy
Uploaded: 2017-12-20T21:12:44+00:00
Duration: PTM13S25
Channel: Larry Lin
Description: Politecnico di Milano, Italy

Politecnico di Milano, Italy
SMASH: A Heuristic Methodology for Designing Partially Reconfigurable MPSoCs Riccardo Cattaneo, Christian Pilato, Gianluca C. Durelli, Marco D. Santambrogio and Donatella Sciuto Politecnico di Milano, Italy IEEE International Symposium on Rapid System Prototyping – Montreal, Canada – October 4, 2013

What is an FPGA? Hardware device that can be customized after the fabrication to execute a specific functionality Distinct hardware blocks are “intrinsically” running in parallel on the device Heterogeneous grid of interconnected components look-up tables (LUTs), block rams (BRAMs), digital signal processors (DSPs), switch matrices, input/output blocks (IOBs) etc… Possibility to reuse resources by reconfiguring part of the logic at run time (partial reconfiguration)

Heterogeneous SoCs with FPGAs
Highly coupled heterogeneous systems Zynq Platform: ARM Dual-Cortex A9 cores tightly coupled with a Xilinx Artix-7 FPGA High speed, low latency reconfigurable interconnect AVNet ZedBoard (Zynq7000-based dev board) Coarse Grain overview of Zynq7000 All-Programmable SoC

Design Challenges and Motivation
Hardware engineer needs to: partition the application in blocks (partitioning) determine which parts are better to be executed in hardware (mapping and scheduling) generate the systems (architecture refinement) Partial reconfiguration allows reusing the same logic across different tasks More tasks can be ported in hardware Significant overhead to be taken into account INPUT SMASH The steps are strictly interdependent!

SMASH: Proposed Methodology
Design Space Exploration determines the proper mapping and scheduling Architecture Refinement customizes the architectural template to derive the corresponding platform

Mapping and Scheduling
Input: Task graph (DAG) Architectural Template Identifies resources constraints Implementations List of different trade-offs in terms of performance and resources Output: Implementation and component for each task Order of execution

Implementation vs. Component
Each task can have multiple alternative implementations on the same component Faster tasks usually require more resources Some tasks can share implementations to execute the same functionality multiple times Hardware reuse: no reconfiguration is required Implementation is more related to functionality and resources Component is more related to where the task is actually executed Processor or hardware module

SMASH: Execution Overview
Simultaneous MApping and Scheduling Heuristic SMASH iteration Generate trace Schedule trace Evaluate metrics Store solution Termination? No Yes Return best solution

Exploring Mapping and Scheduling
Exploration based on the Serial Generation Scheme (SGS) Constructive approach to better handle design constraints Decision is not taken if it would lead to a constraint violation Different combinations of mapping and scheduling Each decision represents a mapping of a task with respect to an implementation and a processing element The order of selection represents the priority values for resolving scheduling conflicts on the resources

Ant Colony Optimization
Our proposed approach is based on Ant Colony Optimization (ACO) to limit unfeasible solutions Cooperative behavior of the ants while searching The ant has different possibilities at each step and takes stochastic decisions, composing a trace Stochastic principles guarantee exploration (a probability is generated for each admissible decision at each step) Feed-backs guarantee the exploitation of good parts of the solutions

Algorithm Overview Pseudo-code of the proposed ACO-based exploration:
Exploration: generating trace Mapping decision Exploitation: updating global information

Stochastic Selection Process
At each decision point d, the probability to assign a candidate j (task/communication) to a proper implementation point i (implementation+processing element) is: Global information G: feedback information Probability that the decision leads to a good solution Local heuristic L: problem-specific hint “Adjusted” by the global heuristic if wrong Roulette wheel and extraction of a combination i, j Probability is generated iff the resources required by the resulting PEs can be satisfied by the architecture global heuristic local heuristic There is always the possibility of adding a new PE or reusing an existing one (platform customization)

More about SMASH Simultaneous MApping and Scheduling Heuristic No Yes
SMASH iteration Generate trace Schedule trace Evaluate metrics Store solution Termination? No Yes Return best solution 13

Trace Generation and Evaluation
Evaluation is performed only on the complete trace Updated version of the original TG augmented with communications and reconfigurations Reconfiguration is taken into account from the early stages of the design process Possibility to include different evaluation methods Analytical estimations vs. TLM simulations Decisions composing the best solution are reinforced As the time goes, the best trace is identified [generazione della traccia] Analisi statica, simulazione TLM (vedi SoC), possibilità di modificare il TG per integrare informazioni più dettagliate (diverse comunicazioni, riconfigurabilità) Possibility to evaluate and compare alternative approaches

Scheduling Definition
Input Task graph (DAG) Trace: ordered list of mapping decisions (task-component-implementation) Output Start/end time estimations for each task Goal Reduce total execution time Task Component Implementation A p1 impl_0 B p2 impl_1 C impl_2 D p3 impl_3

Scheduling: Methodology Overview
SMASH scheduler Task graph and trace Extended task graph Metrics Create extended task graph Actual scheduling (assign times) Evaluate Metrics

Extended TG: Communications
Adding explicit tasks based on the communication topology

Extended TG: Reconfigurations
A reconfiguration task is introduced iff: Two processing tasks are mapped on the same component and Their implementations are different, i.e., module cannot be reused Insertion of a reconfiguration task: New edges are introduced from all WRITEs exiting the source processing task to the reconfiguration New edges are introduced from the reconfiguration to all the READs entering the target processing task

Extended TG: Reconfigurations
Task Component Implementation A p1 impl_0 B p2 impl_1 C impl_2 D p3 impl_3

Trace Evaluation Possibility to integrate different policies to generate the corresponding scheduling

Architecture Refinement
Actual platform instance is derived based on the resulting decisions Hardware modules with only one task assigned are converted into static IP blocks Hardware modules with more tasks assigned are represented as reconfigurable regions Integration with the generation of the run time manager to manage reconfigurations Still work in progress and manually performed

Experimental Evaluation
Synthetic benchmarks (TGFF) Focus on scalability of the approach Possibility to evaluate different task graph patterns Resulting systems (platform instance and extended task graph with mapping/scheduling decisions) converted into virtual platforms Validation of the different solutions assuming correctness of the execution Simulations performed with Synopsys Platform Architect VPU performance annotations extracted from tasks’ implementations

Experimental Setup Three different class of experiments:
Static: FPGA area is divided into a set of up to KS static IP cores (no partial reconfiguration) Mixed: both IP cores and reconfigurable regions can be used, with an upper bound of KM IPs and RM reconfigurable regions. Reconfigurable: architectures with no more than KR regions Reconfigurable regions can be also deployed as static cores in the final architecture if only one task is assigned to them

Experimental Results Small task graphs cannot benefit of reconfiguration Large task graphs are affected by communication overhead static mixed reconfigurable #Task IPs RRs HW tasks #Reconf 12 7 6 20 18 1 17 19 31 30 4 16 41 8 40 14 52 9 51 25 26 60 15 10 53 28 27 70 55 58 33 83 11 80 54 81 56 90 23 3 5 39 100 46

Conclusions and Future Work
SMASH is an automated methodology to design reconfigurable systems It determines the mapping and scheduling of the different tasks It allows customizing the architectural template Future work Integration of floorplanning procedures to compuate and validate physical constraints of the blocks Automatic generation of the platform specification

End…

Politecnico di Milano, Italy

Similar presentations

Presentation on theme: "Politecnico di Milano, Italy"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Politecnico di Milano, Italy

Similar presentations

Presentation on theme: "Politecnico di Milano, Italy"— Presentation transcript:

Similar presentations

About project

Feedback