Kandemir224/MAPLD 20041 Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.

Slides:



Advertisements
Similar presentations
Survey of Detection, Diagnosis, and Fault Tolerance Methods in FPGAs
Advertisements

Hao wang and Jyh-Charn (Steve) Liu
Tamper-Tolerant Software: Modeling and Implementation International Workshop on Security (IWSEC 2009) October 28-30, 2009 – Toyama, Japan Mariusz H. Jakubowski.
725/ASP-DAC Using Loop Invariants to Fight Soft Errors in Data Caches Sri Hari Krishna N., Seung Woo Son, Mahmut Kandemir, Feihui Li Department of.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
1 Closed-Loop Modeling of Power and Temperature Profiles of FPGAs Kanupriya Gulati Sunil P. Khatri Peng Li Department of ECE, Texas A&M University, College.
Event Driven Real-Time Programming CHESS Review University of California, Berkeley, USA May 10, 2004 Arkadeb Ghosal Joint work with Marco A. Sanvido, Christoph.
SIGMETRICS 2008: Introduction to Control Theory. Abdelzaher, Diao, Hellerstein, Lu, and Zhu. CPU Utilization Control in Distributed Real-Time Systems Chenyang.
Department of Electrical and Computer Engineering Texas A&M University College Station, TX Abstract 4-Level Elevator Controller Lessons Learned.
Compiler-Based Code Partitioning for Intelligent Embedded Disk Processing Guilin Chen, Guangyu Chen, M. Kandemir, A. Nadgir The Pennsylvania State University.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Architecture and Real Time Systems Lab University of Massachusetts, Amherst An Application Driven Reliability Measures and Evaluation Tool for Fault Tolerant.
Build-In Self-Test of FPGA Interconnect Delay Faults Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University.
Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
ED 4 I: Error Detection by Diverse Data and Duplicated Instructions Greg Bronevetsky.
Issues on Software Testing for Safety-Critical Real-Time Automation Systems Shahdat Hossain Troy Mockenhaupt.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
A FREQUENCY HOPPING SPREAD SPECTRUM TRANSMISSION SCHEME FOR UNCOORDINATED COGNITIVE RADIOS Xiaohua (Edward) Li and Juite Hwu Department of Electrical and.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
TASK ADAPTATION IN REAL-TIME & EMBEDDED SYSTEMS FOR ENERGY & RELIABILITY TRADEOFFS Sathish Gopalakrishnan Department of Electrical & Computer Engineering.
Distributed Control of FACTS Devices Using a Transportation Model Bruce McMillin Computer Science Mariesa Crow Electrical and Computer Engineering University.
CS Fall 2007 Dr. Barbara Boucher Owens. CS 2 Text –Main, Michael. Data Structures & Other Objects in Java Third Edition Objectives –Master building.
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
Programming Concepts Chapter 3.
J. Christiansen, CERN - EP/MIC
Using co-design techniques to increase the reliability of the Electronic control System for a Multilevel Power Converter Javier C. Brook, Francisco J.
Hardware Support for Trustworthy Systems Ted Huffmire ACACES 2012 Fiuggi, Italy.
THE TESTING APPROACH FOR FPGA LOGIC CELLS E. Bareiša, V. Jusas, K. Motiejūnas, R. Šeinauskas Kaunas University of Technology LITHUANIA EWDTW'04.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
Fault-Tolerant Systems Design Part 1.
European Test Symposium, May 28, 2008 Nuno Alves, Jennifer Dworak, and R. Iris Bahar Division of Engineering Brown University Providence, RI Kundan.
Computer Architecture and Operating Systems CS 3230: Operating System Section Lecture OS-6 Deadlocks Department of Computer Science and Software Engineering.
Using Software Rules To Enhance FPGA Reliability Chandru Mirchandani Lockheed-Martin September 7-9, 2005 P226-W/MAPLD2005 MIRCHANDANI 1.
CprE 458/558: Real-Time Systems
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Modeling Mobile-Agent-based Collaborative Processing in Sensor Networks Using Generalized Stochastic Petri Nets Hongtao Du, Hairong Qi, Gregory Peterson.
Software solutions for challenges in embedded systems Sri Hari Krishna Narayanan, The Pennsylvania State University, USA, Theme While.
Using Loop Invariants to Detect Transient Faults in the Data Caches Seung Woo Son, Sri Hari Krishna Narayanan and Mahmut T. Kandemir Microsystems Design.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Introduction to Real-Time Systems
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Improving the Reliability of Commodity Operating Systems Michael M. Swift, Brian N. Bershad, Henry M. Levy Presented by Ya-Yun Lo EECS 582 – W161.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
SENG521 (Fall SENG 521 Software Reliability & Testing Fault Tolerant Software Systems: Techniques (Part 4a) Department of Electrical.
POLITECNICO DI MILANO A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems Dynamic Reconfigurability in Embedded.
Gill 1 MAPLD 2005/234 Analysis and Reduction Soft Delay Errors in CMOS Circuits Balkaran Gill, Chris Papachristou, and Francis Wolff Department of Electrical.
Lecture 18B Exception Handling and Richard Gesick.
Chandrasekhar 1 MAPLD 2005/204 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan.
Kandemir224/MAPLD Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering.
OPERATING SYSTEMS CS 3502 Fall 2017
Programmable Logic Devices
QianZhu, Liang Chen and Gagan Agrawal
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
Chapter 19: Architecture, Implementation, and Testing
A Methodology for System-on-a-Programmable-Chip Resources Utilization
Mapping into LUT Structures
FPGA Implementation of Multicore AES 128/192/256
Anne Pratoomtong ECE734, Spring2002
Exception Handling and
Mi Zhou, Li-Hong Shang Yu Hu, Jing Zhang
GPU Scheduling on the NVIDIA TX2:
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Presentation transcript:

Kandemir224/MAPLD Reliability-Aware OS Support for FPGA-Based Systems M. Kandemir, G. Chen, and F. Li Department of Computer Science & Engineering The Pennsylvania State University, USA

Kandemir 224/MAPLD Introduction and Acronyms Increasing soft-error rates make reliability an important factor in system design Our focus: Reliability-aware OS scheduling for FGPA based systems FPGA: Field Programmable Gate Array CLB: Configurable Logic Block STG: SubTask Graph

Kandemir 224/MAPLD The Reconfigurable System CLB Configurable Logic Block a 6X8 CLB array the interconnects and input-output blocks are omitted Process 1 Process 2 Process 3

Kandemir 224/MAPLD Improving Reliability Traditionally, OS-scheduler schedules parallel executions of multiple processes to maximize FPGA space utilization Data dependencies between different processes might prevent the full utilization of FPGA space Our approach utilizes the available FPGA space to duplicate processes and improve reliability

Kandemir 224/MAPLD Duplicating Processes CLB Process 1 Process 2 Process 3 Duplicate of Process 3 Duplicate of Process 1

Kandemir 224/MAPLD Issues in Duplicating Processes Tasks (processes) have different criticality Each task may require a different amount of FPGA space Duplications can cause performance degradation We use a QoS parameter to indicate the maximum tolerable performance degradation A checker task is scheduled for each duplicated task to check the outputs of the primary task and the duplicate

Kandemir 224/MAPLD Subtask Graph (STG) ViVi VjVj Each process to be scheduled is presented by a subtask graph Each node represents a process code portion (subtask) that will be executed in a single quantum of time once it gets scheduled. The j th node of process i is denoted as STG ij Indicates a data or control dependence from v i to v j

Kandemir 224/MAPLD Subtask Graph ViVi VjVj Since our processes are extracted from the same application, there might be data dependences between different processes

Kandemir 224/MAPLD Our Approach Five Step Task duplication under QoS guarantees Current implementation focuses only on error detection Annotation step QoS specification step Task identification step Task ranking step Scheduling step

Kandemir 224/MAPLD Our Approach Annotation step QoS specification step Task identification step Task ranking step Scheduling step The application programmer indicates which data structure are critical from the reliability view point using annotations. The application programmer also indicates the tolerable latency during application execution as a result of the reliability provided.

Kandemir 224/MAPLD Our Approach Annotation step QoS specification step Task identification step Task ranking step Scheduling step An automatic application code analyzer analyzes the source code, and identifies tasks. Based on how these tasks operate on critical data, they are ranked. They are ordered from the most important task to the least important one.

Kandemir 224/MAPLD Our Approach Annotation step QoS specification step Task identification step Task ranking step Scheduling step The OS scheduler is modified such that whenever there is opportunity, the OS duplicates tasks that run on FPGA device. Whenever the scheduler predicts the QoS limit is about to be reached, it stops duplicating the tasks.

Kandemir 224/MAPLD Experimental Setup An error injection module injects errors with a specified probability Two real-life embedded applications: encr and usonic The performance of our reliability-aware scheduler is compared with that of a normal Short-Job-First scheduler Tolerate at most 5% performance degradation Rank tasks according to the frequency of accesses to critical data Fatal errors: Errors that would lead to crash of the application

Kandemir 224/MAPLD Experimental Data

Kandemir 224/MAPLD Ongoing Work Experimenting with a diverse set of benchmarks Implementing task duplication within other types of OS schedulers such as First-Come-First-Server

Kandemir 224/MAPLD Conclusion The OS scheduler tries to provide reliability through task duplication under QoS guarantees Improving FPGA space utilization by duplicating for reliability Providing reliability for critical tasks first Catching most fatal errors