On-the-fly Healing of Race Conditions in ARINC-653 Flight Software

Slides:



Advertisements
Similar presentations
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Servlets and Java Server Pages.
Advertisements

MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Transactional Memory Parag Dixit Bruno Vavala Computer Architecture Course, 2012.
SE-292: High Performance Computing
Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
Race Directed Random Testing of Concurrent Programs KOUSHIK SEN - UNIVERSITY OF CALIFORNIA, BERKELEY PRESENTED BY – ARTHUR KIYANOVSKI – TECHNION, ISRAEL.
Chapter 6: Process Synchronization
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.
TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.
An Integrated Framework for Dependable Revivable Architectures Using Multi-core Processors Weiding Shi, Hsien-Hsin S. Lee, Laura Falk, and Mrinmoy Ghosh.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
1 Organization of Programming Languages-Cheng (Fall 2004) Concurrency u A PROCESS or THREAD:is a potentially-active execution context. Classic von Neumann.
CprE 458/558: Real-Time Systems
1 FM Overview of Adaptation. 2 FM RAPIDware: Component-Based Design of Adaptive and Dependable Middleware Project Investigators: Philip McKinley, Kurt.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
1 RAKSHA: A FLEXIBLE ARCHITECTURE FOR SOFTWARE SECURITY Computer Systems Laboratory Stanford University Hari Kannan, Michael Dalton, Christos Kozyrakis.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Students: Nadia Goshmir, Yulia Koretsky Supervisor: Shai Rozenrauch Industrial Project Advanced Tool for Automatic Testing Final Presentation.
Operating System A program that controls the execution of application programs An interface between applications and hardware 1.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 2: System Structures.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Analyzing parallel programs with Pin Moshe Bach, Mark Charney, Robert Cohn, Elena Demikhovsky, Tevi Devor, Kim Hazelwood, Aamer Jaleel, Chi- Keung Luk,
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
CS533 Concepts of Operating Systems Jonathan Walpole.
Reference: Ian Sommerville, Chap 15  Systems which monitor and control their environment.  Sometimes associated with hardware devices ◦ Sensors: Collect.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
An OBSM method for Real Time Embedded Systems Veronica Eyo Sharvari Joshi.
The Relational Model1 Transaction Processing Units of Work.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Detecting Atomicity Violations via Access Interleaving Invariants
Demand-Driven Software Race Detection using Hardware Performance Counters Joseph L. Greathouse †, Zhiqiang Ma ‡, Matthew I. Frank ‡ Ramesh Peri ‡, Todd.
Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:
CSCI1600: Embedded and Real Time Software Lecture 17: Concurrent Programming Steven Reiss, Fall 2015.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Operating System Concepts
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Embedded Real-Time Systems Processing interrupts Lecturer Department University.
Embedded Real-Time Systems
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Real-time Software Design
Healing Data Races On-The-Fly
Self Healing and Dynamic Construction Framework:
For Massively Parallel Computation The Chaotic State of the Art
Operating System 2 Overview
Effective Data-Race Detection for the Kernel
Real-time Software Design
Fault Injection: A Method for Validating Fault-tolerant System
CS 501: Software Engineering Fall 1999
Operating System 2 Overview
Introduction of Week 13 Return assignment 11-1 and 3-1-5
CSCI1600: Embedded and Real Time Software
CSCI1600: Embedded and Real Time Software
Operating System 2 Overview
Presentation transcript:

On-the-fly Healing of Race Conditions in ARINC-653 Flight Software Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea

Contents ARINC-653 ARINC-653 Health Management Data Races On-the-fly Race Healing Framework Race Healing Mechanism Development Evaluation Conclusion

ARINC-653 ARINC-653 standard defines an application executive (APEX) To provide OS or Middle-ware services for IMA The main objective of ARINC-653 is to provide temporal and spatial partitioning To enable applications, each executing in a partition, to run simultaneously and independently on the same architecture Temporal partitioning provides strict time slicing to guarantee that only one application accesses resources at each time Spatial partitioning provides strict memory management by guaranteeing that a partition exclusively accesses a memory area

ARINC-653 Health Management (1/2) An important feature in ARINC-653 is indisputably its health monitor (HM) It has the responsibility to detect and provide recovery mechanisms for hardware and software failures It has the objective of containing and isolating faults before they propagate across the whole system. HM manages recovery tables in three levels indexed by both of the error identifier and the system state for a precise error handling System HM Table Module HM Table Partition HM Table

ARINC-653 Health Management (2/2) For errors at process level, the HM invokes a user-defined aperiodic error handler The error handler should be efficient and execute as fast as possible not to monopolize the system RTOS Configuration (XML) Health Monitor Module OS Applications …

Data Races (1/2) Data races may occur when two concurrent threads access a shared memory location without proper inter-thread coordination, and at least one of the accesses is a write. Unpredictable and mysterious results due to data races may be reported to the programmer An example of multithreaded program Expected result Thread A Thread B Read Write Thread A: //dCount is shared Lock(L1) Read dCount; Add one; Write dCount; Unlock(L1); Thread B: Let’s consider “dCount++” instruction

Data Races (2/2) Under the influence of the scheduler, the program may run into different interleaving and produce unexpected results Synchronization errors lead to asymmetric races Symmetric races are usually benign, but asymmetric races are generally harmful Our race healing is motivated by these harmful races Thread A Thread B Satisfactory result Read Write Thread A Thread B Unexpected results Read Write

On-the-fly Race Healing Framework (1/2) We reinforce the native health monitoring function of ARINC-653 with race detection and healing abilities Concept of race healing in ARINC-653 Thread A Thread B Race Detection Health Monitor Partition OS ARINC 653 Race Healing Add/Remove Lock Value Checking Read Write Notifies Invokes Heals

On-the-fly Race Healing Framework (2/2) Instrumented program is monitored by on-the-fly race detector Once a data race is detected, the HM is notified The race healer will be invoked by the concerned partition OS as error handler The race healer accesses the racing code and tries to heal the data race If the healer fails to do this, a notification is sent back to the HM, which might launch an emergency recovery function Instrumented Program On-the-fly Race Detection Engine On-the-fly Race Healing Engine Log Partition OS Health Monitor Native Error Handler ARINC 653 Monitoring (1) (2) (3) (4) (5)

Race Detection Engine For on-the-fly race detection, our framework uses the protocol presented by Dinning and Schonberg, 1991 This protocol guarantees to detect at least one race for each shared variable, if any exists The protocol defines the structure and the maintaining policy for an access history with locking mechanism Access History TM Read Write CS-Read CS-Write TA TB R1 R1 R3 R3 Reported Races W2 W2 W4 W4 W2-R3 R1-W4 W2-W4

Race Healing Engine To heal asymmetric races, our technique inserts a lock into not or incompletely synchronized thread to remove or change interleaving Thread A Thread B Thread A Thread B Thread A Thread B Race Detection Race Detection Read Read Read Read Read Read Healing Write Write Write Write Write Write

Development Environment Single Board Computer (SBC) with Intel Xeon Dual core 2 CPUs and 4GB Memory RT-Linux operating system GNU C compiler 4.3 for OpenMP Simulated integrated modular avionics (SIMA) was installed to provide ARINC-653 services The race detector and the race healer are both implemented as dynamic libraries using C The healing function is registered in each monitored program as its error handler Upon race detection, the SIMA HM is notified by the race detector using RAISE_APPLICATION_ERROR system call.

Evaluation The efficiency of our framework was evaluated by analyzing the overhead of the race healing functions The overhead comes from actions of the label generator, the race detector, and the race healer The results shows that our technique slows down in average about 2 times the original program execution A set of synthetic programs which only consider asymmetric races was developed using OpenMP directives

Conclusion Race Healing Framework Experimentation and Result This paper presents a framework that can be embedded in the ARINC-653 health monitor to detect and heal data races on-the-fly It assures the flight software to run safely Experimentation and Result The framework implemented on the simulated integrated modular avionics (SIMA) that provides ARINC-653 services The experimental results show that our framework slows down in average about 2 times the original program execution The overhead introduced by our framework is manageable for a large class of soft real-time programs We will extend the healing functionality to handle more general race patterns