Configuration Reusing in On-Line Task Scheduling for Reconfigurable Computing Systems By M.M. Bassiri and H. S. Shahhoseini Elisha Colmenar School of Engineering Engineering Systems and Computing
Outline Introduction Background Real-Time Application Defining the Problem Proposed Algorithm Simulation Results Critiques Conclusion
Introduction Reconfigurable Computing (RC) systems can Reconfigured at runtime Support partial reconfigurability Execute tasks in true multitasking manner Reconfigurable Operating System Reconfigurable Processing Unit (RPU) Task scheduler that manages resources for tasks and assigns them to the RPU This paper targets 1D area scheduling. - So we all know the what RC is. RC systems can be reconfig at runtime and support partial reconfig which makes us able to execute tasks in a true multitasking manner. To manage such systems at runtime, a reconfig OS is needed. All operating systems need an efficient task scheduler to manager One main responsibility is the management of resources
Background Let task Ti = (wi, hi, ei, ai, di, ri) Soft real-time and independent tasks Let task Ti = (wi, hi, ei, ai, di, ri) wi = width hi = height ei = execution time ai = arrival time di = deadline ri = reconfiguration time of the task Soft real-time : missed deadline is not fatal Independent task: No task dependency between tasks
Background Cont’d Two types of RPU area models for task mapping 2D Area Model 1D Area Model Flexible 2D Area Model Partitioned 2D Flexible 1D Partitioned 1D
Real-Time Application There are large variety of applications which can satisfy the stated assumptions such as: Reconfigurable co-processor implementation Image and video processing Cryptography Telecommunication Neural network implementation
Defining the Problem Find an efficient on-line scheduler and placement of independent tasks in the RPU with 1D area model. The main goals are to minimize: Task rejection ratio (TRR) Overall execution time of the tasks Reconfiguration overhead Reconfig overhead -> reconfig time
Proposed Algorithm Reusing-based scheduling (RBS) Ti = Task Ai = arrival time Ri = reconfigurable time RBS method is that some of arrived tasks are not removed from RPU surface after the completion of their execution and they stay on RPU surface as long as possible. This eliminates reconfiguration overhead since we are reusing the configuration that is already on the RPU. However, with this method alone, it is impossible to maintain all arrived tasks on the RPU. Therefore, the algorithm is extended
Division of Tasks Arriving task Existing tasks are categorized Non-significant Significant Large overhead Probability of recurrence (Poisson’s probability distribution) Existing tasks are categorized Running tasks Scheduled tasks Preserved tasks Divide the arriving tasks into two groups: (i.e. P2 and P1) - Significant (P2) : These are preserved for configuration reuse to reduce reconfig overhead If Large reconfig overhead (Oi) and large probability of recurrence. - Non-significant (P1): Removed from RPU after their completion Divide the existing tasks into three categories: - Running tasks: Currently executing on the RPU - Scheduled tasks: Scheduled but have not started yet - Preserved tasks: Execution have been completed. Divide the RPU into three: free area, occupied area and preserved area
Total Scheme of RPU Partitioning RPU Area divided into: Free area Preserved area Occupied Area This prevents surface fragmentation Divide the arriving tasks into two groups: significant and non-significant. (i.e. P2 and P1) Divide the existing tasks into three categories: Running tasks, scheduled tasks and preserved tasks. Divide the RPU into three: (i.e. Shaded area and white area) - Free area: no tasks - Occupied area: running tasks and scheduled tasks - Preserved area : preserved tasks As you can see in this figure. Free Area and preserved area. Non sig P1 and sig P2 tasks that have a certain width (Wp1) This can be dynamically resized during runtime
Dynamic Partitioning Partition Utilization (PU) where Woccupied = total width of occupied area Wpreserved = total width of preserved area Wfree = total width of free area on Partition Pi WPi = width of partition i Constraint for max allowable (feasible) partition size.
Task Placement Best-fit policy Smallest preserved task that can accommodate arrived task selected for replacement Least Probability of recurrence (LPR) policy Task that has smallest probability of recurrence Large enough to accommodate the new task Replace old task with new task
Pseudo-code of Algorithm if (SGi = 0) /*Non-significant Task */ if(free area in P1) {Use Stuffng method} else /*No free Area*/ {if ( Partition Resizing is feasible) {P1 is expanded and new task is placed in P1} else /*No Resizing*/ {Task is rejected}} else /*SGi = 1 Significant Task*/ { Scan occupied area and preserved area of P2} if( Task exists in P2) /*Reuse Task*/ if(existing instance is running) {fi = sj + ei /*Finish time = start time - exec time*/ LST = di - ei /* Latest start time = dead time = exec time */ if (LST > fj) {Finish running instance and schedule task to run new task immediately after } else /*LST < fj*/ if (Free Area is free) {Use free Area} elseif (Partition Expansion Feasible) {Resize the partition} elseif (Any task on P2 can be replaced ) {Replace with new Task} else { New Task is Rejected }} else {Use free Area} Use Stuffng method /*Schedules tasks into arbitrary free rectangles that will exist in the future*/
Simulation Setup C ++ Language 1GHz Pentium III computer Simulated device 96 × 64 RCU (Xilinx XCV1000 FPGA – Virtex 2.5) There are four groups of input Each task group includes 20 task sets which includes 50 synthetic tasks
Simulation Setup Cont’D Probability distribution rate that is based on real world applications are shown in Table 1 Simulated arriving tasks have a probability distribution rate that is based on real world applications
Simulation Setup Cont’d There are four groups of input with a parameter called repetition rate (RR): Task groups: Group A: 5 ≤ RRA ≤ 15 Group B: 15 < RRB ≤ 30 Group C: 30 < RRC ≤ 60 Group D: RRD = 0 The other parameters are shown in Table 2
Simulation Setup Cont’d For the simulated results, instead of using arriving task time they used a Workload parameter: Ti = Arriving Task ΔT = Time interval from arrival of first to last task W = total width of RPU
Simulation Setup Cont’d Reusing-Based Scheduling (RBS) Window-based Stuffing (WBS) Fragmentation-Aware by Handa (FA-H) Fragmentation-Aware by Cui (FA-C)
Simulation Results Task Group A Task Group B Task Group C Task Group D As the task repition becomes larger, RBS task rejection ratio is lower. Also as expected the TRR is not better in group D because the repition ratio is equal to zero.
Total Execution Time of Tasks RBS is much better than the other algorithms because of the high configuration reusing in RBS alg.
Runtime Comparison
Critiques Can be reproduced but should use the latest Xilinx device (Virtex 6 or 7) and faster computer It was well written Good flow from the beginning to end Simulation Results Clarity of Workload parameter was not obvious for various algorithms Only uses 1D area model when there are 2D area models that are more efficient with configurable resource utilization In my own opinion: This paper Critiques The algorithm presented in this article can be reproduced, but should be reproduced using the latest Xilinx device (Vertex 6) and faster computer It was well written with a good flow from the beginning with a small background to the end with simulation results. The Workload simulation results were not really necessary. Them more important ones were the execution and runtime comparison results The algorithm pro Only uses 1D area model when there are 2D area models that are more efficient with configurable resource utilization.
Conclusion Reusing-Based Scheduling (RBS) Performs on-line scheduling and placement based on maximal task reusing. RBS is a solution to the main goals: Minimizes task rejection ratio (TRR) Has the lowest execution time of the tasks Reduces reconfiguration overhead
THANK YOU