Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kandemir224/MAPLD 20041 Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.

Similar presentations


Presentation on theme: "Kandemir224/MAPLD 20041 Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering."— Presentation transcript:

1 Kandemir224/MAPLD 20041 Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering The Pennsylvania State University, USA

2 Kandemir 224/MAPLD 2004 2 Outline Introduction Background knowledge Improving reliability by duplicating tasks Experimental results Ongoing work and conclusion

3 Kandemir 224/MAPLD 2004 3 Acronyms FPGA: Field Programmable Gate Array CLB: Configurable Logic Block STG: Subtask Graph

4 Kandemir 224/MAPLD 2004 4 Introduction FPGA combines the flexibility of software and high performance of ASICs Prior research mostly addressed architecture design and programming and compilation issues Increasing soft-error rates make reliability an important factor in system design Our focus: Reliability-aware OS scheduling for FGPA based systems

5 Kandemir 224/MAPLD 2004 5 The Reconfigurable System CLB Configurable Logic Block a 6X8 CLB array the interconnects and input-output blocks are omitted Process 1 Process 2 Process 3

6 Kandemir 224/MAPLD 2004 6 Improving Reliability Traditionally, OS-scheduler schedules parallel executions of multiple processes to maximize FPGA space utilization Data dependencies between different processes might prevent the full utilization of FPGA space Our approach utilizes the available FPGA space to duplicate processes and improve reliability

7 Kandemir 224/MAPLD 2004 7 Duplicating Processes CLB Process 1 Process 2 Process 3 Duplicate of Process 3 Duplicate of Process 1

8 Kandemir 224/MAPLD 2004 8 Issues in Duplicating Processes Tasks (processes) have different criticality Each task may require a different amount of FPGA space Duplications can cause performance degradation We use a QoS parameter to indicate the maximum tolerable performance degradation A checker task is scheduled for each duplicated task to check the outputs of the primary task and the duplicate

9 Kandemir 224/MAPLD 2004 9 Subtask Graph (STG) ViVi VjVj Each process to be scheduled is presented by a subtask graph Each node represents a process code portion (subtask) that will be executed in a single quantum of time once it gets scheduled. The j th node of process i is denoted as STG ij Indicates a data or control dependence from v i to v j

10 Kandemir 224/MAPLD 2004 10 Subtask Graph ViVi VjVj Since our processes are extracted from the same application, there might be data dependences between different processes

11 Kandemir 224/MAPLD 2004 11 Our Approach Task duplication under QoS guarantees Current implementation focuses only on error detection Annotation step QoS specification step Task identification step Task ranking step Scheduling step

12 Kandemir 224/MAPLD 2004 12 Annotation Step The application programmer indicates which data structure are critical from the reliability view point using annotations Annotation step QoS specification step Task identification step Task ranking step Scheduling step

13 Kandemir 224/MAPLD 2004 13 QoS Specification Step The application programmer also indicates the tolerable latency during application execution as a result of the reliability provided Annotation step QoS specification step Task identification step Task ranking step Scheduling step

14 Kandemir 224/MAPLD 2004 14 Task Identification Step An automatic application code analyzer analyzes the source code, and identifies tasks Annotation step QoS specification step Task identification step Task ranking step Scheduling step

15 Kandemir 224/MAPLD 2004 15 Task Ranking Step Based on how these tasks operate on critical data, they are ranked They are ordered from the most important task to the least important one Annotation step QoS specification step Task identification step Task ranking step Scheduling step

16 Kandemir 224/MAPLD 2004 16 Scheduling Step The OS scheduler is modified such that whenever there is opportunity, the OS duplicates tasks that run on FPGA device Whenever the scheduler predicts the QoS limit is about to be reached, it stops duplicating the tasks Annotation step QoS specification step Task identification step Task ranking step Scheduling step

17 Kandemir 224/MAPLD 2004 17 Experimental Setup An error injection module injects errors with a specified probability Two real-life embedded applications: encr and usonic The performance of our reliability-aware scheduler is compared with that of a normal Short-Job-First scheduler Tolerate at most 5% performance degradation Rank tasks according to the frequency of accesses to critical data Fatal errors: Errors that would lead to crash of the application

18 Kandemir 224/MAPLD 2004 18 Experimental Data

19 Kandemir 224/MAPLD 2004 19 Ongoing Work Experimenting with a diverse set of benchmarks Implementing task duplication within other types of OS schedulers such as First-Come-First-Server

20 Kandemir 224/MAPLD 2004 20 Conclusion The OS scheduler tries to provide reliability through task duplication under QoS guarantees Improving FPGA space utilization by duplicating for reliability Providing reliability for critical tasks first Catching most fatal errors


Download ppt "Kandemir224/MAPLD 20041 Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering."

Similar presentations


Ads by Google