Download presentation
Published byMegan Willis Modified over 11 years ago
1
On-the-fly Healing of Race Conditions in ARINC-653 Flight Software
Ok-Kyoon Ha, Guy Martin Tchamgoue, Jeong-Bae Suh, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Republic of Korea
2
Contents ARINC-653 ARINC-653 Health Management Data Races
On-the-fly Race Healing Framework Race Healing Mechanism Development Evaluation Conclusion
3
ARINC-653 ARINC-653 standard defines an application executive (APEX)
To provide OS or Middle-ware services for IMA The main objective of ARINC-653 is to provide temporal and spatial partitioning To enable applications, each executing in a partition, to run simultaneously and independently on the same architecture Temporal partitioning provides strict time slicing to guarantee that only one application accesses resources at each time Spatial partitioning provides strict memory management by guaranteeing that a partition exclusively accesses a memory area
4
ARINC-653 Health Management (1/2)
An important feature in ARINC-653 is indisputably its health monitor (HM) It has the responsibility to detect and provide recovery mechanisms for hardware and software failures It has the objective of containing and isolating faults before they propagate across the whole system. HM manages recovery tables in three levels indexed by both of the error identifier and the system state for a precise error handling System HM Table Module HM Table Partition HM Table
5
ARINC-653 Health Management (2/2)
For errors at process level, the HM invokes a user-defined aperiodic error handler The error handler should be efficient and execute as fast as possible not to monopolize the system RTOS Configuration (XML) Health Monitor Module OS Applications …
6
Data Races (1/2) Data races may occur when two concurrent threads access a shared memory location without proper inter-thread coordination, and at least one of the accesses is a write. Unpredictable and mysterious results due to data races may be reported to the programmer An example of multithreaded program Expected result Thread A Thread B Read Write Thread A: //dCount is shared Lock(L1) Read dCount; Add one; Write dCount; Unlock(L1); Thread B: Let’s consider “dCount++” instruction
7
Data Races (2/2) Under the influence of the scheduler, the program may run into different interleaving and produce unexpected results Synchronization errors lead to asymmetric races Symmetric races are usually benign, but asymmetric races are generally harmful Our race healing is motivated by these harmful races Thread A Thread B Satisfactory result Read Write Thread A Thread B Unexpected results Read Write
8
On-the-fly Race Healing Framework (1/2)
We reinforce the native health monitoring function of ARINC-653 with race detection and healing abilities Concept of race healing in ARINC-653 Thread A Thread B Race Detection Health Monitor Partition OS ARINC 653 Race Healing Add/Remove Lock Value Checking Read Write Notifies Invokes Heals
9
On-the-fly Race Healing Framework (2/2)
Instrumented program is monitored by on-the-fly race detector Once a data race is detected, the HM is notified The race healer will be invoked by the concerned partition OS as error handler The race healer accesses the racing code and tries to heal the data race If the healer fails to do this, a notification is sent back to the HM, which might launch an emergency recovery function Instrumented Program On-the-fly Race Detection Engine On-the-fly Race Healing Engine Log Partition OS Health Monitor Native Error Handler ARINC 653 Monitoring (1) (2) (3) (4) (5)
10
Race Detection Engine For on-the-fly race detection, our framework uses the protocol presented by Dinning and Schonberg, 1991 This protocol guarantees to detect at least one race for each shared variable, if any exists The protocol defines the structure and the maintaining policy for an access history with locking mechanism Access History TM Read Write CS-Read CS-Write TA TB R1 R1 R3 R3 Reported Races W2 W2 W4 W4 W2-R3 R1-W4 W2-W4
11
Race Healing Engine To heal asymmetric races, our technique inserts a lock into not or incompletely synchronized thread to remove or change interleaving Thread A Thread B Thread A Thread B Thread A Thread B Race Detection Race Detection Read Read Read Read Read Read Healing Write Write Write Write Write Write
12
Development Environment
Single Board Computer (SBC) with Intel Xeon Dual core 2 CPUs and 4GB Memory RT-Linux operating system GNU C compiler 4.3 for OpenMP Simulated integrated modular avionics (SIMA) was installed to provide ARINC-653 services The race detector and the race healer are both implemented as dynamic libraries using C The healing function is registered in each monitored program as its error handler Upon race detection, the SIMA HM is notified by the race detector using RAISE_APPLICATION_ERROR system call.
13
Evaluation The efficiency of our framework was evaluated by analyzing the overhead of the race healing functions The overhead comes from actions of the label generator, the race detector, and the race healer The results shows that our technique slows down in average about 2 times the original program execution A set of synthetic programs which only consider asymmetric races was developed using OpenMP directives
14
Conclusion Race Healing Framework Experimentation and Result
This paper presents a framework that can be embedded in the ARINC-653 health monitor to detect and heal data races on-the-fly It assures the flight software to run safely Experimentation and Result The framework implemented on the simulated integrated modular avionics (SIMA) that provides ARINC-653 services The experimental results show that our framework slows down in average about 2 times the original program execution The overhead introduced by our framework is manageable for a large class of soft real-time programs We will extend the healing functionality to handle more general race patterns
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.