Healing Data Races On-The-Fly

Slides:



Advertisements
Similar presentations
Operating Systems Semaphores II
Advertisements

On-the-fly Healing of Race Conditions in ARINC-653 Flight Software
Operating Systems Part III: Process Management (Process Synchronization)
50.003: Elements of Software Construction Week 6 Thread Safety and Synchronization.
Ch 7 B.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Concurrent Programming James Adkison 02/28/2008. What is concurrency? “happens-before relation – A happens before B if A and B belong to the same process.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Mutual Exclusion.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Secure Operating Systems Lesson 5: Shared Objects.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.
/ PSWLAB Concurrent Bug Patterns and How to Test Them by Eitan Farchi, Yarden Nir, Shmuel Ur published in the proceedings of IPDPS’03 (PADTAD2003)
Today’s Agenda  Midterm: Nov 3 or 10  Finish Message Passing  Race Analysis Advanced Topics in Software Engineering 1.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
CS533 Concepts of Operating Systems Class 3 Data Races and the Case Against Threads.
Avishai Wool lecture Introduction to Systems Programming Lecture 4 Inter-Process / Inter-Thread Communication.
Microsoft Research Faculty Summit Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign.
Semaphores. Announcements No CS 415 Section this Friday Tom Roeder will hold office hours Homework 2 is due today.
University of Pennsylvania 9/19/00CSE 3801 Concurrent Processes CSE 380 Lecture Note 4 Insup Lee.
Chapter 2.3 : Interprocess Communication
CS533 Concepts of Operating Systems Class 3 Monitors.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Operating Systems CMPSCI 377 Lecture.
Synchronization CSCI 444/544 Operating Systems Fall 2008.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
1 Testing Concurrent Programs Why Test?  Eliminate bugs?  Software Engineering vs Computer Science perspectives What properties are we testing for? 
1 Thread Synchronization: Too Much Milk. 2 Implementing Critical Sections in Software Hard The following example will demonstrate the difficulty of providing.
Quick overview of threads in Java Babak Esfandiari (extracted from Qusay Mahmoud’s slides)
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 7: Process Synchronization Background The Critical-Section Problem Synchronization.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
15.1 Threads and Multi- threading Understanding threads and multi-threading In general, modern computers perform one task at a time It is often.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
Comunication&Synchronization threads 1 Programación Concurrente Benemérita Universidad Autónoma de Puebla Facultad de Ciencias de la Computación Comunicación.
/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:
CSC CSC 143 Threads. CSC Introducing Threads  A thread is a flow of control within a program  A piece of code that runs on its own. The.
Agenda  Quick Review  Finish Introduction  Java Threads.
Chapter 6 Synchronization Dr. Yingwu Zhu. The Problem with Concurrent Execution Concurrent processes (& threads) often access shared data and resources.
Java Thread Programming
Detecting Data Races in Multi-Threaded Programs
Multithreading / Concurrency
Background on the need for Synchronization
Process Synchronization
Concurrency.
Multithreaded Programming in Java
Multiple Writers and Races
CS533 Concepts of Operating Systems Class 3
Multithreading.
Lecture 2 Part 2 Process Synchronization
Background and Motivation
CS533 Concepts of Operating Systems Class 3
CS333 Intro to Operating Systems
Problems with Locks Andrew Whitaker CSE451.
Eraser: A dynamic data race detector for multithreaded programs
CSE 542: Operating Systems
Don Porter Portions courtesy Emmett Witchel
Presentation transcript:

Healing Data Races On-The-Fly Bohuslav Krena, Zdenek Letko, Rachel Tzoref, Shmuel Ur, and Tomas Vojnar Ok-Kyoon Ha OS Lab., GNU

Contents Background Motivation Self-Healing Steps Experiment Problem Detection Problem Localization Problem Healing Healing Assurance Experiment Conclusion

Background- what is a race? A data race occurs when two concurrent threads access a shared variable - at least one access is a write - the accesses are unordered by any synchronization Usually a data race is a serious error caused by failure to synchronize properly. This paper distinguishes races - Atomicity races - Inherent races

Background- Atomicity Races Races caused by violation of wrong assumptions that some blocks of code will be executed atomically Thread 1 Thread 2 void someMethod( ){ shared = update(shard); }

Background- Inherent Races Races not related to atomicity Data race if the following holds: Executing any segment of cod in each thread atomically does not determine an order of accesses to shared variable. The different orders in which the shared variable is accessed can be classified as “good” and “bad” according to the expected behavior of the program.

Motivation Race detection tools do not verify some of the races, or they can report many false alarms. Even if the problems are known in the best testing or verification techniques, there are situations in which it is not easy to fix it. - embedded software in hardware requires expensive cost for solving it (replacing, updating) If the software could fix its concurrency problems itself on-the-fly, it would be very desirable.

Self-Healing Steps Problem detection Problem localization Problem healing Healing assurance to detect that something is wrong with the system to find the root cause of the problem applying a fix to the problem using the localization stage to check/prove the self-healing action

Problem Detection Eraser algorithm Principle: - Detects so called apparent data races Principle: - For each variable maintains its state and the set of candidate locks - Race is detected whenever: + the variable is state shared + the set of candidates locks becomes empty

Extended Eraser Algorithm Virgin – the variable has not been initialized yet. Exclusive – the variable is accessed only by the thread which initialized it. Shared – the variable is read by multiple threads. Shared-modified – the variable is read and written by multiple threads. Race – a data race on this variable has been detected (due to no or a wrong lock has been used when accessing the variable). Figure 1: Possible states of a shared variable

An Example of Detection Main T1 T2 bookTicket ( ); <lock> static class Flight { private int soldSeats; … Flight ( ) { soldSeats = 0; } boolean bookTicket ( ) { soldSeats++; new Flight ( ); bookTicket ( ); Time Shared <T1> C(v) = {lock} Exclusive <Main> C(v) = {} Virgin Race <T2> C(v) = {}

Problem Localization Often hard work even for programmer. This paper uses pre-specified data race bug patterns in the code with the aid of information collected by race detector Use formal methods to reduce the number of false alarms but with reasonable overhead.

Atomicity Violation Bug Patterns load-store bug pattern x++; test-and-use bug pattern if (p != null) p = p.next; repeated test-and-use bug pattern while (p != null) 0: aload_0 1: getfield #2 4: ifnull 18 7: aload_0 8: aload_0 9: getfield #2 12:getfield #3 15:putfield #2 18: …

An Example of a Bug Pattern static class Flight { private int soldSeats; … Flight ( ) { soldSeats = 0; } boolean bookTicket ( ) { soldSeats++; 2: getfield #2 5: iconst_1 6: iadd 7: putfield #2

Healing Atomicity Races Influencing the scheduler Forcing a context switch: yield( ) or sleep(0) to guarantee full time for atomicity execution from the scheduler safe and legal solution only decrease the probability of race manifestation T1 T2 Thread.yield( ); 2: getfield #2 5: iconst_1 6: iadd 7: putfield #2 2: getfield #2 5: iconst_1 6: iadd 7: putfield #2

Healing Atomicity Races Influencing the scheduler Temporary changes of the priorities to guarantee full time for atomicity execution from the scheduler safe and legal solution only decrease the probability of race manifestation strongly dependent on OS and JVM Thread.setPriority (MAXPRIORITY); … Thread.setPriority (originalPriority);

Healing Atomicity Violation Adding Synchronization Actions Suitable use of mutexes (locks). to prevent accesses being simultaneous heal the race can introduce new (and even more dangerous) bugs: deadlock HealingMutex.lock ( ); … HealingMutex.unlock ( );

Healing Inherent Races Distinguish between “good” and “bad” orders Thread 1 1) done = false Thread 2 for (int i=1; i<100; i++) { print (i); } done = true raceLock.lock( ); raceLock.unlock( ); Thread 2 → Thread 1 : Bad order (done = false) Thread 1 →Thread 2 : Good order (done = true)

Healing Inherent Races enforce on “good” order change the scheduling of the program: wait( ) and notify( ) Thread 1 raceLock.lock ( ); done = false raceLock.unlock ( ); Thread 2 for (int i=1; i<100; i++) { print (i); } raceLock.lock ( ); done = true raceLock.unlock ( ); wait ( ); notify ( );

Healing Inherent Races override “bad” order concentrate on write accesses does not prevent bad order from occurring Thread 1 raceLock.lock ( ); done = false raceLock.unlock ( ); Thread 2 for (int i=1; i<100; i++) { print (i); } raceLock.lock ( ); done = true raceLock.unlock ( ); assume that we know good order (T1 → T2) only maintains T2’s value, if it execution bad order (T2 → T1)

Healing Assurance static analysis or bounded model checking reduce false alarms during detection and localization ensure that a new bug can not be introduced help to choose suitable healing method

Preliminary Results Implemented race detector is able: to detect wrong locking policy using Eraser algorithm to detect load-store atomicity bug pattern to localize the race and give enough information to the developer to heal founded race by influencing scheduler and also by additional synchronization

Experiments made all the tests on 1, 2, and 4 processor and for 2, 3, 5, 10, and 15 working threads heals the race in all cases by new explicit lock

Related Work ToleRace tool concentrates on asymmetric races based on transforming the critical regions of code at the end of the region check a race can “tolerate” it by producing the correct result based on the local copies of shared variables possible to heal only a read-write race does not heal write-write races

conclusion applies self-healing in the context of fixing data races in the Java programs explained three bug patterns leading to data races proposed possible self-healing actions to be taken when a bug pattern is detected Future work implementation of efficient healing techniques [LeVK08]