Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat.

Slides:



Advertisements
Similar presentations
An Overview of ABFT in cloud computing
Advertisements

Mafijul Islam, PhD Software Systems, Electrical and Embedded Systems Advanced Technology & Research Research Issues in Computing Systems: An Automotive.
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Project outline  EMCROSS - European Multicore Cross-Domain Architecture  Jürgen Meilinger, Airbus Defence and Space,  Cross-domain.
Carnegie Mellon R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems Junsung Kim, Karthik Lakshmanan and Raj Rajkumar Electrical.
DARPA ITS PI Meeting – Honolulu – July 17-21, 2000Slide 1 Aegis Research Corporation Intrusion Tolerance Using Masking, Redundancy and Dispersion DARPA.
Soft Real-Time Semi-Partitioned Scheduling with Restricted Migrations on Uniform Heterogeneous Multiprocessors Kecheng Yang James H. Anderson Dept. of.
Making Services Fault Tolerant
Dependability TSW 10 Anders P. Ravn Aalborg University November 2009.
Preemptive Behavior Analysis and Improvement of Priority Scheduling Algorithms Xiaoying Wang Northeastern University China.
CSE 322: Software Reliability Engineering Topics covered: Dependability concepts Dependability models.
Presented By: Vinay Kumar.  At the time of invention, Internet was just accessible to a small group of pioneers who wanted to make the network work.
Embedded and Real Time Systems Lecture #4 David Andrews
2. Introduction to Redundancy Techniques Redundancy Implies the use of hardware, software, information, or time beyond what is needed for normal system.
2/23/2009CS50901 Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Fred B. Schneider Presenter: Aly Farahat.
Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.
Reliability on Web Services Pat Chan 31 Oct 2006.
Dependability ITV Real-Time Systems Anders P. Ravn Aalborg University February 2006.
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Strategic Directions in Real- Time & Embedded Systems Aatash Patel 18 th September, 2001.
CprE 458/558: Real-Time Systems
1 Making Services Fault Tolerant Pat Chan, Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Miroslaw Malek.
Page 1 Copyright © Alexander Allister Shvartsman CSE 6510 (461) Fall 2010 Selected Notes on Fault-Tolerance (12) Alexander A. Shvartsman Computer.
DISTRIBUTED COMPUTING
The Design and Performance of A Real-Time CORBA Scheduling Service Christopher Gill, David Levine, Douglas Schmidt.
Industry Session – Mixed Criticality and Multi-Core David Corman Program Director, Cyber Physical Systems National Science Foundation 1.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
Real-Time Software Design Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Towards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems Abhilash Thekkilakattil, Huseyin Aysan and Sasikumar Punnekkat.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Preemption Control.
Secure Systems Research Group - FAU 1 A survey of dependability patterns Ingrid Buckley and Eduardo B. Fernandez Dept. of Computer Science and Engineering.
An efficient active replication scheme that tolerate failures in distributed embedded real-time systems Alain Girault, Hamoudi Kalla and Yves Sorel Pop.
Probabilistic Preemption Control using Frequency Scaling for Sporadic Real-time Tasks Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.
Quantifying the Sub-optimality of Non-preemptive Real-time Scheduling Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.
Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat Mälardalen Real-time Research Center, Mälardalen University Västerås, Sweden Towards Preemption.
Analysis of Real-Time Multi-Modal FP-Scheduled Systems with Non-Preemptible Regions Authors: Masud Ahmed (presenting) Pradeep Hettiarachchi Nathan Fisher.
Building Dependable Distributed Systems Chapter 1 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Secure Systems Research Group - FAU 1 Active Replication Pattern Ingrid Buckley Dept. of Computer Science and Engineering Florida Atlantic University Boca.
The Global Limited Preemptive Earliest Deadline First Feasibility of Sporadic Real-time Tasks Abhilash Thekkilakattil, Sanjoy Baruah, Radu Dobrin and Sasikumar.
Prepare by : Ihab shahtout.  Overview  To give an overview of fixed priority schedule  Scheduling and Fixed Priority Scheduling.
Relaxing the Synchronous Approach for Mixed-Criticality Systems
CprE 458/558: Real-Time Systems
Optimal Resource Allocation for Protecting System Availability against Random Cyber Attack International Conference Computer Research and Development(ICCRD),
Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.
Resource Augmentation for Performance Guarantees in Embedded Real-time Systems Abhilash Thekkilakattil Licentiate Thesis Presentation Västerås, November.
Fault Tolerant Scheduling of Mixed Criticality Real-Time Tasks under Error Bursts Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.
Diversity for Dependability * Jean-Claude Laprie PRDC’99 — December 16-17, 1999 — Hong Kong * Elaboration on «Diversity against Accidental and Deliberate.
Tolerating Communication and Processor Failures in Distributed Real-Time Systems Hamoudi Kalla, Alain Girault and Yves Sorel Grenoble, November 13, 2003.
Multiprocessor Fixed Priority Scheduling with Limited Preemptions Abhilash Thekkilakattil, Rob Davis, Radu Dobrin, Sasikumar Punnekkat and Marko Bertogna.
1 Semi-partitioned Model on Dual-core Mixed Criticality System Hao Xu.
Attributes Availability Reliability Safety Confidentiality Integrity Maintainability Dependability Means Fault Prevention Fault Tolerance Fault Removal.
Introduction to Fault Tolerance By Sahithi Podila.
Resource Augmentation for Fault-Tolerance Feasibility of Real-time Tasks under Error Bursts Abhilash Thekkilakattil, Radu Dobrin, Sasikumar Punnekkat and.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Task Mapping and Partition Allocation for Mixed-Criticality Real-Time Systems Domițian Tămaș-Selicean and Paul Pop Technical University of Denmark.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Fault-Tolerant Rate- Monotonic Scheduling Sunondo Ghosh, Rami Melhem, Daniel Mosse and Joydeep Sen Sarma.
Secure Proactive Recovery – a Hardware Based Mission Assurance Scheme 1 6 th International Conference on Information Warfare and Security, 2011.
Reducing the Number of Preemptions in Real-Time Systems Scheduling by CPU Frequency Scaling Abhilash Thekkilakattil, Anju S Pillai, Radu Dobrin, Sasikumar.
REAL-TIME OPERATING SYSTEMS
Fault-Tolerant NoC-based Manycore system: Reconfiguration & Scheduling
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
Fault and Energy Aware Communication Mapping with Guaranteed Latency for Applications Implemented on NoC Sorin Manolache, Petru Eles, Zebo Peng {sorma,
Fault Tolerance Distributed Web-based Systems
DETERMINISTIC ETHERNET FOR SCALABLE MODULAR AVIONICS
Mattan Erez The University of Texas at Austin July 2015
Introduction to Fault Tolerance
Maria Méndez Real, Vincent Migliore, Vianney Lapotre, Guy Gogniat
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Presentation transcript:

Mixed Criticality Systems: Beyond Transient Faults Abhilash Thekkilakattil, Alan Burns, Radu Dobrin and Sasikumar Punnekkat

Motivation and Contribution  State of the art of mixed criticality scheduling mainly focuses on WCET overruns  WCET overruns are one example of transient faults  We propose an approach for design and scheduling of mixed criticality systems under permanent faults

Introduction  Mixed criticality scheduling deals with scheduling real-time tasks with varying levels of WCET assurances  Growing interest in mixed criticality scheduling since Vestal’s RTSS’07 paper  230 citations according to Google Scholar  Over 200 follow-up papers according to “Mixed Criticality Systems- A Review” (6 th ed.) by Burns and Davis

Goals of Mixed Criticality Scheduling  Enable certification by different certifying authorities –Demonstrate timeliness under different WCETs  Enable efficient utilization of the underlying computing infrastructure –Enabling safe sharing of the computing infrastructure –Ensuring isolation of critical from less critical tasks

State of the Art Mixed Criticality Scheduling  Criticality monotonic priority ordering  Adaptive and static scheduling  Scheduling with virtual deadlines/periods  Mixed criticality scheduling under faults

The Dependability Perspective Focus of MC scheduling Avizienis et al., Basic Concepts and Taxonomy of Dependable and Secure Computing, IEEE Transactions of Dependable and Secure Computing, 2004 Dependability Means Attributes Threats Faults Errors Failures Reliability Safety Maintainability Confidentiality Integrity Availability Fault tolerance Fault Prevention Fault Removal Fault Forecasting

Faults, Errors and Failures Fault ErrorFailure WCET overrun Task deadline miss High criticality deadline miss A bit flip Wrong computed valueIncorrect actuation Many different types of faults (except WCET overruns) are not covered by Vestal-like models

Classification of Faults Faults Transient FaultsPermanent Faults Fault whose presence is limited in time Examples include bit flips and WCET overruns Solution: temporal redundancy e.g., task re- executions Fault whose presence is continuous in time Examples include memory and processor failures Solution: spatial redundancy e.g., using additional hardware

Transient Fault Tolerance  Temporal redundancy: replicate the tasks in time Re-execute the task Execute an alternate task  The time for re-execution/alternate task execution can be seen as the “extra time” needed in Vestal’s model Level 1 WCET Level 2 WCET Level 3 WCET Level 4 WCET

Classification of Faults Fault Transient FaultsPermanent Faults Fault whose presence is limited in time Examples include bit flips and WCET overruns Solution: temporal redundancy e.g., task re- executions Fault whose presence is continuous in time Examples include memory and processor failures Solution: spatial redundancy e.g., using additional hardware Focus of this paper

Focus of this Paper How to design mixed criticality real-time architectures to tolerate permanent faults? Contribution: 1.Propose a fault coverage based mapping of criticalities 2.Present a taxonomy of fault tolerance mechanisms in the context of mixed criticality systems

Classification of Permanent Faults  Design Faults –Faults due to deficiencies in design and development e.g., manufacturing defects in computers –Hardware and software design faults  Random Faults –Faults whose time of occurrence nor the cause can be determined e.g., faults due to wear and tear  Byzantine faults –Faults in which replicas behave arbitrarily differently –Worst kind of faults: requires high amount of redundancy

Tolerating Permanent Faults Requires additional hardware (N-modular paradigm) –Replicate the tasks on multiple hardware –Perform voting to determine and mask failures –Diversity to prevent common cause failures Replica 1 Replica 2 Replica 3 Voter input output

Goals of Mixed Criticality Scheduling  Enable certification by different certifying authorities –Demonstrate timeliness under different WCETs  Enable efficient utilization of the underlying computing infrastructure –Enabling safe sharing of the computing infrastructure –Ensuring isolation of critical from lesser critical tasks Timeliness does not imply certification Safety standards mandate redundancy for safety

Goals of Mixed Criticality Scheduling  Enable certification by different certifying authorities –Demonstrate timeliness under different WCETs  Enable efficient utilization of the underlying computing infrastructure –Enabling safe sharing of the computing infrastructure –Ensuring isolation of critical from lesser critical tasks Timeliness does not imply certification Safety standards mandate redundancy for safety Highest level of “protection” for all tasks?

Mapping Criticalities Based on Fault Coverage CriticalityTransient Faults Random Faults Software Faults Hardware Faults Byzantine Faults High Medium Low Non-critical Partially covered Design Faults

High Criticality Tasks Dedicated hardware to guarantee isolation 3b+1 replicas and byzantine fault tolerance mechanism to tolerate b byzantine faults Hardware and Software diversity to protect against design faults Replica 1 input Replica 2 input Replica 3 input Voter (byzantine fault tolerance) output Replica 3b +1 input ……

Medium Criticality Tasks High integrity hardware that is shared among medium criticality tasks Time triggered scheduling and lock-step execution Replication for protection against random faults Hardware and software diversity for protection against design faults high integrity processor 1 high integrity processor 2 Task A Task B Task A Task B Voter output

Low Criticality Tasks COTS hardware, e.g., a multicore processor, that is shared among low criticality tasks Time aware voter and loose synchronization: less development effort Replication for protection against random faults Software diversity for protection against software design faults Core1 (scheduler: EDF) Core 2 (scheduler: FPS) Task A Time aware voter output Task B Time aware voter: Manages outputs delivered at different instants Signals early and late timing errors A1 A2 B2 Task B unfinished execution

Non-Critical Tasks Scheduled along with low criticality tasks Timeliness is guaranteed in the absence of faults Discarded upon failures Possibility of using existing MC scheduling algorithms Guarantees isolation of higher criticality tasks Limited form of redundancy can be provided exploiting spare processing capacity

Mapping Criticalities Based on Fault Coverage CriticalityTransient Faults Random Faults Software Faults Hardware Faults Byzantine Faults High Medium Low Non-critical redundancy software diversityhardware diversity byzantine fault tolerance redundancy software diversityhardware diversity redundancy software diversity Limited redundancy Design Faults

Conclusions Approach for design of mixed criticality systems in the context of permanent faults through: –Fault coverage based mapping of criticalities –Criticality based provisioning of resources –Isolation of higher criticality tasks –Implicit coverage of WCET overrun faults Future Work –Methods for efficient allocation of replicas to processors –Consideration of safety analysis in the allocation and scheduling of tasks –Providing better-than-average service to non-critical tasks

Questions ? Thank You !