Toward Energy-Aware Software-Based Fault Tolerance in Real-Time Systems Osman S. Unsal, Israel Koren, C. Mani Krishna Architecture and Real-Time Systems.

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

Mafijul Islam, PhD Software Systems, Electrical and Embedded Systems Advanced Technology & Research Research Issues in Computing Systems: An Automotive.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Fault-Tolerant Scheduling Techniques.
ECE-777 System Level Design and Automation Hardware/Software Co-design
Introduction and Background  Power: A Critical Dimension for Embedded Systems  Dynamic power dominates; static /leakage power increases faster  Common.
Carnegie Mellon R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems Junsung Kim, Karthik Lakshmanan and Raj Rajkumar Electrical.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
7. Fault Tolerance Through Dynamic (or Standby) Redundancy The lowest-cost fault-tolerance technique in multiprocessors. Steps performed: When a fault.
Low Overhead Fault Tolerant Networking (in Myrinet)
Extended Gantt-chart in real-time scheduling for single processor
Architecture and Real Time Systems Lab University of Massachusetts, Amherst An Application Driven Reliability Measures and Evaluation Tool for Fault Tolerant.
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.5 Forward Recovery Systems Upon the detection of a failure, the system discards the current.
Embedded Systems Exercise 3: Scheduling Real-Time Periodic and Mixed Task Sets 18. May 2005 Alexander Maxiaguine.
1 Scheduling Mapping of tasks to time slots  Computation  Communication Mapping of power usage to time slots  Mechanical devices  Thermal subsystems.
Architecture and Real Time Systems Lab University of Massachusetts, Amherst I Koren and C M Krishna Electrical and Computer Engineering University of Massachusetts.
Kick-off meeting 3 October 2012 Patras. Research Team B Communication Networks Laboratory (CNL), Computer Engineering & Informatics Department (CEID),
H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Thermal Aware Resource Management Framework Xi He, Gregor von Laszewski, Lizhe Wang Golisano College of Computing and Information Sciences Rochester Institute.
Redundant Array of Independent Disks
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.
Towards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems Abhilash Thekkilakattil, Huseyin Aysan and Sasikumar Punnekkat.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Thanks to Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction n What is an Operating System? n Mainframe Systems.
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
Baoxian Zhao Hakan Aydin Dakai Zhu Computer Science Department Computer Science Department George Mason University University of Texas at San Antonio DAC.
Energy-Efficient Soft Real-Time CPU Scheduling for Mobile Multimedia Systems Wanghong Yuan, Klara Nahrstedt Department of Computer Science University of.
Cluster Reliability Project ISIS Vanderbilt University.
An efficient active replication scheme that tolerate failures in distributed embedded real-time systems Alain Girault, Hamoudi Kalla and Yves Sorel Pop.
1 EE5900 Advanced Embedded System For Smart Infrastructure Energy Efficient Scheduling.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
Coordinated Scheduling of TCEDs under Peak Power Constraint Gopinath Karmakar 1, A. Kabra 1 and Krithi Ramamritham 2 1 Bhabha Atomic Research Centre, India.
U N I V E R S I T Y O F S O U T H F L O R I D A The basic idea is to start from a difference equation with unknown parameters and orders in the following.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
1 Application-Level Fault Tolerance for Embedded Real-Time Systems Israel Koren Department of Electrical & Computer Engineering University of Massachusetts.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
Error Detection in Hardware VO Hardware-Software-Codesign Philipp Jahn.
CprE 458/558: Real-Time Systems
5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.
Operating Systems for Reconfigurable Embedded Platforms: Online Scheduling of Real-Time Tasks -Ramkumar Shankar.
Chap 7: Consistency and Replication
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Advantages of Time-Triggered Ethernet
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Tolerating Communication and Processor Failures in Distributed Real-Time Systems Hamoudi Kalla, Alain Girault and Yves Sorel Grenoble, November 13, 2003.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
TECHNOLOGY GUIDE THREE Emerging Types of Enterprise Computing.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
Self-stabilizing energy-efficient multicast for MANETs.
A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults Ching-Chih Han, Kang G. Shin, and Jian Wu.
CprE 458/558: Real-Time Systems (G. Manimaran)1 Energy Aware Real Time Systems - Scheduling algorithms Acknowledgement: G. Sudha Anil Kumar Real Time Computing.
Workload Clustering for Increasing Energy Savings on Embedded MPSoCs S. H. K. Narayanan, O. Ozturk, M. Kandemir, M. Karakoy.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Energy-aware QoS packet scheduling.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Studying and Implementing Multi-processor based Real-time Scheduling Algorithms in Linux Musfiq Niaz Rahman
1 © 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Network Architecture Characteristics  Explain four characteristics that are addressed by.
Reliable energy management System reliability is affected by use of energy management The use of DVS increases the probability of faults, thus damaging.
Application Level Fault Tolerance and Detection
Application Level Fault Tolerance and Detection
Digital Processing Platform
PACE: Power-Aware Computing Engines
Anand Bhat*, Soheil Samii†, Raj Rajkumar* *Carnegie Mellon University
Research Topics Embedded, Real-time, Sensor Systems Frank Mueller moss
Presentation transcript:

Toward Energy-Aware Software-Based Fault Tolerance in Real-Time Systems Osman S. Unsal, Israel Koren, C. Mani Krishna Architecture and Real-Time Systems Laboratory Department of Electrical and Computer Engineering University of Massachusetts, Amherst

The Problem Real-Time (RT) systems are energy and thermal constrained. Many RT applications run on battery-powered platforms. RT systems require small form factor. Fault-Tolerance (FT) is an important design parameter in RT systems. Many RT applications are life-critical. Many RT systems operate in hostile (industrial, space) environments. FT ensures error-free operation in the face of faults.

Fault-Tolerance in RT Systems Hardware based fault tolerance Massive redundancy (duplex, TMR) Requires additional hardware for error checking mechanism Very power-inefficient Software based fault tolerance Application-Level Fault Tolerance (ALFT), an amalgam of time and software redundancy

ALFT Characteristics Tasks have a primary and secondary copy Secondaries might be exact copy of primaries, or they could be scaled-down Resolution reduction Precision reduction A secondary task may be aborted if primary successfully finishes execution

The System Model Distributed RT System Tasks are periodic, have deadlines Each primary has one secondary Primary and Secondaries assigned to separate processors Concentrating on scheduling, compare w.r.t. EDF Tasks with random periods, execution-time Six processor configuration

Energy Model The more a task executes, the more the energy consumed. Assumed to linearly scale with the increase in task execution Appropriate for COTS processors

Overlap

A Simple Energy Saving Heuristic : Shortest Execution-Time First (SEF) Power-Unaware Task Duplication Power-Aware (ALFT) Sec. Size Sec.Size 100% 50% EDF SEF Relative Energy Consumption

Another Heuristic: Secondary Execution Time Shifting (SETS)

Case study: Asymmetric Digital Subscriber Line Modem Application PeriodWCETPeriodWCET

Energy Savings for the ADSL Application Secondary Size (%)Energy Savings (%)

Energy Savings for Different Secondary Sizes

Overlap Reduction for Different Secondary Sizes (20 tasks)

Overlap Reduction for Different Secondary Sizes (50 tasks)

Effect of Task Granularity on Energy Savings (Secondary Size 80%)

Effect of Task Granularity on Overlap Reduction (Secondary Size 80%)

Summary An initial analysis into energy-efficiency of various fault-tolerance mechanisms has been made Power-aware scheduling heuristics for ALFT schemes developed Current activity: On-line scheduling heuristics Power-aware DVS for FT systems