Download presentation
Presentation is loading. Please wait.
1
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati {kchatha,ranga}@ececs.uc.edu
2
Organization of Talk Introduction Overview of Tool Codesign partitioner Pipelined Scheduler Results Conclusion
3
Introduction Motivation: loop oriented The throughput of a loop oriented HW-SW application can be maximized by obtaining a pipelined implementation pipelined implementation. Objective: To obtain a pipelined implementation of the application on the codesign architecture such that: - Throughput constraint is satisfied - HW area constraint is satisfied - Number of pipeline stages is minimized - Increase in memory requirement is minimized
4
Introduction Architecture and Task Graph CB A D S = 225 ns H = 175 ns (8 +, …) S = 400 ns H = 150 ns (4 *, 8 +, …) S = 100 ns H = 400 ns (3 *, 3 +, …) S = 200 ns H = 100 ns (4 *, 8 -, …) 10 Data items per dependence Local Memory SW Processor HW Co-processor Shared Memory For SW-HW, HW-SW & HW-HW communication. For SW-SW communication.
5
Introduction Pipelined Design Stage 1 CB A D Stage 0 Stage 2 HW Assumptions: - SW-SW communication time taken in to account by SW runtime of the task. Hence it is not shown. - HW co-processor cannot execute tasks in parallel. 70 data items Steady State (Time = 365 ns) HW Shared Memory SW B D C A RWWWRR
6
Some Definitions A pipelined design is characterized by its initiation interval. Initiation interval (II) is the time difference between the start of two consecutive iterations of the steady state. Given a partitioned task graph there exists a theoretical lower bound on the II of its pipelined schedule called the Minimum Initiation Interval (MII). For a directed acyclic task graph the MII is given by: MII = max (Sum_hw, Sum_sw) where Sum_hw is the sum of execution times of tasks bound to HW and Sum_sw is the sum of execution times of tasks bound to SW.
7
HW-SW Codesign Output Successful Design Unable to Design with Given Constraints Throughput and Area Constraints Task GraphArchitecture Yes Obtain a Pipelined Schedule which executes in II time. Increase II Calculate MII Set II = MII Partition Design Constraint Satisfied ? Schd found ? II > Constraint ? YES NO YES Satisfy throughput and area constraints. Satisfy throughput constraints, minimize the number of pipeline stages and minimize the increase in memory requirements.
8
HW-SW Partitioner Branch and bound algorithm Initial solution tries to minimize MII - Suitability of task to be assigned to HW is given by: - Sort tasks in descending order of their suitabilities. - Assign tasks to HW and SW alternatively from front and back of the sorted list so that Sum_hw and Sum_sw remain balanced. We also apply heuristics to effectively limit the search space of the algorithm.
9
HW-SW Partitioner Area Estimation Resources required by tasks divided into two types: 1. Shared - adders, subtractors, multipliers, dividers 2. Unshared - interconnect and controller Shared resource area estimated by taking the union of the shared resources required by all the HW tasks. Unshared resource area estimated by adding the area associated with the unshared resources of all the HW tasks. Total area estimated by taking the sum of area requirements of shared and unshared resources.
10
Pipelined Scheduling Retiming Transformation (use RECOD Step 2) Select a dependency to retime. (use RECOD Step 1) Try to obtain a task schedule which executes in II time. (use list scheduling) Schd. Found ? Success Failure Yes No Dependency found ?
11
In the following explanation we call the task graph before original loop retiming transformation the original loop and after steady state transformation the steady state. In order to apply retiming transformation we associate iteration index - an iteration index “ ” with every task and dependence distance - a dependence distance “ ” with every dependency. Iteration index of a task u, (u) implies that at the “I” iteration of the steady state instance of task u belonging to (I + (u)) iteration of the original loop is executed. Dependence distance of a dependency uv, (uv) implies that data produced by task u is consumed by task v, (uv) iterations later. Some Definitions
12
RECOD Step 1: Select a dependency to retime 1. Dependency is an intra loop dependency (ILD). 2. Dependency between tasks bound to heterogeneous processors. 3. Dependency whose predecessor task belongs to longer constraining path. 4. Dependency representing the least number of data items transferred. A BCDE HFG I HW SWHW SW Var = 20 Var = 10
13
RECOD Step 2: Partition to minimize increase in memory requirements. A BCDE HFG I Set R Set P Set S Cutset Cost function for the partitioner Retiming Transformation
14
JPEG Case Study We specified the JPEG image compression algorithm as task graph with 12 tasks. We then obtained pipelined codesign implementations by specifying different constraints on the II and HW area.
15
Execution Time We evaluated the runtime of the tool by invoking it for 50 random task graphs and searching for optimal HW-SW partitions.
16
Percentage deviation of initial solution from final We calculated the percentage deviation in initiation interval of the initial partition from the final partition. The average percentage deviation was 8.4%.
17
Percentage deviation of final result from optimal We compared the II obtained by the tool with minimum MII that was obtained during design space exploration. The minimum MII is a lower bound on the global optimum for a particular task graph. The solution obtained by our tool was on an average within 2.2% of the global optimum.
18
Conclusion The tool can optimize the throughput, area, pipeline stages and memory requirements of pipelined HW-SW system. The tool can obtain solutions for task graphs with upto 30 nodes within a short period of time. Although it assumes a single SW processor and single HW coprocessor the technique can be extended to multiple processor architectures. The limitation of the tool is its inability to handle large task graphs (> 30 nodes) in a reasonable amount of time. A time out option with the branch and bound partitioner can overcome this limitation.
19
RECOD Step 1: Select a dependency to retime 1. Dependency is an intra loop dependency (ILD). 2. Dependency between tasks bound to heterogeneous processors. 3. Dependency whose predecessor task belongs to longer constraining path. 4. Dependency representing the least number of data items transferred. A BCDE HFG I HW SWHW SW Var = 20 Var = 10
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.