IBM Labs in Haifa, Software and Verification Technology © 2008 IBM Corporation Testing and Debugging Concurrent Software Shmuel Ur.

Slides:



Advertisements
Similar presentations
Operating Systems Semaphores II
Advertisements

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
CSCC69: Operating Systems
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Interprocess Communication
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Pitfalls in Teaching Development and Testing of Concurrent Programs and How to Overcome them Eitan Farchi.
Multithreading. RHS – SWC 2 What is a thread Inside a single process, multiple threads can be executing concurrently A thread is the execution of (part.
/ PSWLAB Concurrent Bug Patterns and How to Test Them by Eitan Farchi, Yarden Nir, Shmuel Ur published in the proceedings of IPDPS’03 (PADTAD2003)
CS444/CS544 Operating Systems Synchronization 2/16/2006 Prof. Searleman
Concurrency.
Avishai Wool lecture Introduction to Systems Programming Lecture 4 Inter-Process / Inter-Thread Communication.
Review: Process Management Objective: –Enable fair multi-user, multiprocess computing on limited physical resources –Security and efficiency Process: running.
Synchronization Principles. Race Conditions Race Conditions: An Example spooler directory out in 4 7 somefile.txt list.c scores.txt Process.
Synchronization in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
A. Frank - P. Weisberg Operating Systems Introduction to Cooperating Processes.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Discussion Week 3 TA: Kyle Dewey. Overview Concurrency overview Synchronization primitives Semaphores Locks Conditions Project #1.
Concurrency Recitation – 2/24 Nisarg Raval Slides by Prof. Landon Cox.
1 CSCD 330 Network Programming Lecture 13 More Client-Server Programming Sometime in 2014 Reading: References at end of Lecture.
1 Thread II Slides courtesy of Dr. Nilanjan Banerjee.
Internet Software Development More stuff on Threads Paul Krause.
Nachos Phase 1 Code -Hints and Comments
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Java Threads. What is a Thread? A thread can be loosely defined as a separate stream of execution that takes place simultaneously with and independently.
Games Development 2 Concurrent Programming CO3301 Week 9.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
CSC321 Concurrent Programming: §5 Monitors 1 Section 5 Monitors.
Chapter 6 – Process Synchronisation (Pgs 225 – 267)
Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.
Discussion Week 2 TA: Kyle Dewey. Overview Concurrency Process level Thread level MIPS - switch.s Project #1.
CIS 842: Specification and Verification of Reactive Systems Lecture INTRO-Examples: Simple BIR-Lite Examples Copyright 2004, Matt Dwyer, John Hatcliff,
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
1 Condition Variables CS 241 Prof. Brighten Godfrey March 16, 2012 University of Illinois.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Implementing Lock. From the Previous Lecture  The “too much milk” example shows that writing concurrent programs directly with load and store instructions.
CS 153 Design of Operating Systems Winter 2016 Lecture 7: Synchronization.
Agenda  Quick Review  Finish Introduction  Java Threads.
Concurrent Programming in Java Based on Notes by J. Johns (based on Java in a Nutshell, Learning Java) Also Java Tutorial, Concurrent Programming in Java.
CS162 Section 2. True/False A thread needs to own a semaphore, meaning the thread has called semaphore.P(), before it can call semaphore.V() False: Any.
Lecture 5 Page 1 CS 111 Summer 2013 Bounded Buffers A higher level abstraction than shared domains or simple messages But not quite as high level as RPC.
Tutorial 2: Homework 1 and Project 1
Healing Data Races On-The-Fly
Background on the need for Synchronization
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
Chapter 5: Process Synchronization
Threads and Concurrency in Java: Part 2
Lecture 2 Part 2 Process Synchronization
Implementing Mutual Exclusion
CSCI1600: Embedded and Real Time Software
Implementing Mutual Exclusion
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CSE 153 Design of Operating Systems Winter 19
CS333 Intro to Operating Systems
Chapter 6: Synchronization Tools
Foundations and Definitions
CSCI1600: Embedded and Real Time Software
Presentation transcript:

IBM Labs in Haifa, Software and Verification Technology © 2008 IBM Corporation Testing and Debugging Concurrent Software Shmuel Ur

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 2 Why is Concurrent Testing Hard?  Concurrency introduces non-determinism  Multiple executions of the same test may have different interleavings (and different results!)  an interleaving is the relative execution order of the program threads  Very hard to reproduce and debug  No useful coverage measures for the interleaving space  Typically appear only in specific configurations  Therefore commonly found by users  Require large configurations to test  Represent only ~10% of the bugs but an un-proportional number are found late or by the customer -> Very expensive The costly effort of testing concurrency at system level is seemingly unavoidable

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 3 public void increase() { counter++; } Let’s consider the function increase(), which is a part of a class that acts as a counter Although written as a single “increase” operation, the “++” operator is actually mapped into three JVM instructions [load operand, increment, write-back] The Classic Java Example – A Counter

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 4 … load operand increment write-back … … load operand increment write-back … Thread A Thread B Context Switch counter = 3 4 “Race conditions represent one of your biggest enemies when it comes to programming concurrent applications” High Performance Java Platform Computing, Christopher & Thiruvathukal Sun Java Series, 2000 Counter Example – Continued

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 5 Riddle I: How many bugs do you see? public class Helloworld{ public static void main(String[] argv){ Hello helloThread = new Hello(); World worldThread = new World(); helloThread.start(); worldThread.start(); try{Thread.sleep(1000);} catch(Exception exc){}; System.out.print("\n"); } class Hello extends Thread{ public void run(){ System.out.print("hello "); } class World extends Thread{ public void run(){ System.out.print("world"); }

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 6 Riddle II: Understanding Synchronization Primitives  Locks protect the code segment and not the shared data  Obtaining a lock when accessing the shared resources  On an error path (e.g., an exception) does the system release the lock?  Consider the following class: class Conflict {  Conflict(…){ synchronized(Conflict.class){…}; };  void h(…){ synchronized(this){….};};  synchronized void g(…){….};  void r(…){…}; };  synchronized static void f(…){….};  Can f || g, f || h, f || r, g || h, g || r, h || r cause a conflict?

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 7 static final int NUM = 5; static int Global = 0; public static void main(String[] argv){ Global = 0; threads[0] = new Adder(1); threads[1] = new Adder(10); threads[2] = new Adder(100); threads[3] = new Adder(1000); threads[4] = new Adder(10000); // Start Threads for(int i=0;i<NUM;i++){ threads[i].start(); } try{ Thread.sleep(1000); } catch(Exception exc){} // Print Results System.out.println (Global); } How Much is ?

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 8 class Adder extends Thread int Local; Adder(int i){ Local = i; } public void add(){ example.Global += Local; } public void run(){ add(); } class Adder

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 9 Coverage Results without ConTest

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 10 Coverage Results with ConTest

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 11 Java Source code public void add(){ example.Global += Local; } Byte Code Method void add() 0 getstatic #3 3 aload_0 4 getfield #2 7 iadd 8 putstatic #3 11 return method Add

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 12 A Bug Published by the ConTest Project Team

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 13 static void transfer(Transfer t) { balances[t.fundFrom] -= t.amount; balances[t.fundTo] += t.amount; } Expected Behavior: Money should pass from one account to another Observed Behavior: Sometimes the amount taken is not equal to the amount received Possible bug: Thread switch in the middle of money transfers Atomicity is Never Ensured

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 14  Assume Atomic transfer (can't be implemented locally in Java)  balances[t.fundFrom] -= t.amount;  balances[t.fundTo] += t.amount;  Assume a counting loop that checks if total remains the same  Does it work now? Atomicity is Never Ensured II

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 15 B A Scenario 1 The train enters the tunnel. The semaphore automatically turns the light to red. Signalman A sets the light to green. Signalman B sends a message to signalman A, that the train has exited the tunnel. The train exits the tunnel. Signalman A sends a message to signalman B, that a train is in the tunnel.

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 16 B A Scenario 2 The first train enters the tunnel. The semaphore automatically turns the light to red. A sets the light to green. B sends a message to A, that the first train has exited the tunnel. The first train exits the tunnel. Signalman A sends a message to signalman B, that a train is in the tunnel. The second train approaches the tunnel and stops at the red light. The second train enters the tunnel.

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 17 B Scenario 3 A The first train enters the tunnel – the semaphore fails to turn the light red! A sets the signal to green and lets the second train pass. B sends a message to A, that the first train has exited the tunnel. The first train exits the tunnel. Signalman A sends a message to signalman B, that a train is in the tunnel. The second train approaches the tunnel. The second train enters and leaves the tunnel. Signalman A runs to the track and signals to the train to stop and sets the light to red, manually.

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 18 The first train enters the tunnel – the semaphore fails to turn the light red! Signalman A sends a message to signalman B, that a train is in the tunnel. The third train approaches the tunnel and stops. Signalman A runs to the track to set the light to red and signal other trains to stop. But a second train arrives and enters the tunnel before he has time to change the light to red. The driver sees something but doesn’t have time to stop. The driver of the second train is suspicious that something is wrong and stops in the middle of the tunnel. Signalman A sends another message to signalman B, that a train is in the tunnel. B Scenario 4 A A interprets the message as meaning that both trains have exited the tunnel, and he sets the light to green. B sends a message to A, that the train has exited the tunnel. The first train exits the tunnel. The third train enters the tunnel. The second train driver decides to back out of the tunnel. The second and third trains collide.

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 19 Bug Found by ConTest in Websphere Site Analyzer  Crawler, a ~1000 line Java component, is part of the Websphere Site Analyzer used to perform content analysis of web sites  Bug description:  if (connection != null) connection.setStopFlag();  Connection is checked to be !null  CPU is lost  Connection is set to null before CPU is regained  If this happens before connection.setStopFlag(); is executed, an exception is taken  This bug was found while we were still testing ConTest  This bug should (also) have been found in unit testing...

IBM Labs in Haifa, Software and Verification Technology © 2008 IBM Corporation Some Typical Concurrent Bug Patterns

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 21 Not-Atomic  An operation is assumed to be atomic but is actually not  Source code operations often seem to the inexperienced programmer to be atomic when they are not  Example: x++

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 22 Two stage access:  We are given two tables  To change a record in the second table, the first table is queried and then the second  Each table is protected by a separate lock lock [ First query key1 -> key2 ] window -> the tables can be changed here lock [ Second query key2 -> record to be changed ] Two-Stage-Access

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 23 Wrong/No-Lock Wrong lock or no lock  Protection of thread one does not apply to thread two  There is an access protocol that is not followed due to:  A new team member  An attempt to improve performance Thread 1 Thread 2 Synchronized (o){ x++; x++; }

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 24 Initialization-Sleep  One example is adding sleep() statements to ensure that only the correct interleavings occur  Partial, non-consistent results are used by the thread that assumes that initialization is done

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 25 Lost-Notify  Losing notify: the notify is “lost” because it occurs before the thread executes the wait() primitive  The gap was created because the programmer didn’t think the notify would occur before the wait Thread 1 Thread 2 synchronized (o){ o.notifyAll(); } Synchronized (o){ o.wait(); }

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 26 Blocking-Critical-Section  Blocking critical section  In the design of a critical section protocol we assume that the thread executing the critical section will eventually exit  This assumption might be broken if the code is written by a third party or a different group

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 27 Orphaned-Thread  The tale of the orphaned thread  A single master thread drives actions of other threads  Messages are put on the queue by the master thread and processed by the worker’s threads  Abnormal termination of the master thread results in the remaining threads being orphaned  The system often blocks

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 28 Unintentional-Different-Thread  A call to an API (typically a GUI API) is assumed to be in the same thread but is actually in a different thread causing the order of locking to sometimes change resulting in a deadlock  Nasty when you have a deadlock in a single thread program…

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 29 Condition-For-Wait  Missing condition enclosing the wait  When returning from a wait the programmer forgets to check, or checks incorrectly if the reason for which he waited still holds  In Java 1.5 a wait can decide to terminate. Is your code ready?  From  A thread can also wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. While this will rarely occur in practice, applications must guard against it by testing for the condition that should have caused the thread to be awakened, and continuing to wait if the condition is not satisfied. In other words, waits should always occur in loops, like this one:  synchronized (obj) {  while ( )  obj.wait(timeout);... // Perform action appropriate to condition  }

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 30 Mode = Delay Direction = Out Link IP UDP Application Java Tool Link IP UDP Application Java ConTest/UDP Link IP UDP Java Tool Link IP UDP Application Java ConTest/UDP Delaying messages with ConTest/UDP Network 123 AB

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 31 Mode = No-Noise Link IP UDP Application Java Tool Link IP UDP Application Java Link IP UDP Java Link IP UDP Application Java Delaying messages with ConTest/UDP Network AB ConTest/UDP

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 32 Mode = Block Direction = Out Link IP UDP Application Java Tool Link IP UDP Application Java Link IP UDP Java Link IP UDP Application Java Losing messages with ConTest/UDP Network 123 XXX AB ConTest/UDP

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 33 Mode = No-Noise Link IP UDP Application Java Tool Link IP UDP Application Java Link IP UDP Java Link IP UDP Application Java Losing messages with ConTest/UDP Network 123 XXX 4 4 AB ConTest/UDP

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 34 Some Interesting Observation  In practice thread switches are few and far between  The probability that the previous bug will be found is low  Synchronization usually operate as a no-op  Removing all synchronizations usually will not impact testing results!  Not knowing the synchronization primitives exact definition does not impact testing but the program is incorrect  Exception in synchronization: do you still have the lock?  What is the synchronization on?  Thread scheduling for small applications is almost deterministic in simple environment  Each environment has its own interleaving  Customer on the first day find bugs in well tested applications!

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 35 How Does a Noise Maker Work?  Find places which are “concurrently interesting”  Places whose relative interleaving may change the result of the program: access to shared variables, synch primitives  Possibly use bug patterns to identify such places  Lost notify  Sleep()  Condition before synchronization  Modify the program by adding interleaving changing mechanisms  Usually sleep(), yield() but more advance mechanisms exist  Make sure no new bug is introduced  Make sure that the interleaving change is random  Download to see one

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 36 With ConTest each Test Goes a Long Way

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 37 ConTest Debug Options – many other options will talk about some of them later  Deadlock  Location of each thread  Cycle of waiting on locks  Lock discipline violation  Lock taking trace  Lock Discipline – causing and healing  orange_box – remember {last_values_size} of values and locations for each variable. Created for null pointer exception  Replay – record and reuse replay information  Talk with (socket/keyboard) ConTest while the application is running

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 38 Unit Testing Concurrent Code with ConTest 1.Remove non-relevant code (optional) 2.Create a number of tests 3.Apply ConTest to the code 4.Run each test multiple time and measure synchronization coverage 1.If a bug is found, use ConTest to debug, fix, go to 3 2.If deadlock violation found, fix, go to 3 5.If coverage not sufficient, analyze 1.Need to rerun tests, go to 4 2.Need to write new tests, go to 2 6.Done

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 39 Function and System Test with ConTest  Increased efficiency in finding intermittent bugs  Negligible change to process in Java (in C recompilation is required)  Installed in WebSphere size projects in a day  Works under any test automation used (e.g., Rational tools)  Additional benefits:  Coverage measurements  Aids in debugging  Replay feature

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 40 Measuring Concurrent Coverage  ConTest also measures coverage  The feature that people wanted most is not to find bugs   How do you know if the tests are any good from concurrency view?  Synchronization coverage  Make sure that every synchronization primitive did something  See - Applications of synchronization coverage

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 41 Finding Potential Deadlocks  Standard idea – find violation of lock discipline  Our innovation – enable cross run collection of data  Attention is being paid to it in MSR

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 42 Automatic Debugging as Feature Selection  Think of each instrumentation point as a feature  Execute the program many times with different subset  Each instrumentation point gets a score  P(i) = P(success|X i )/P(!success|X i )  There are false negatives and possitives  The nodes with the highest score may be related to location of failure

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 43 Score Distributed Along Program Lines

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 44 The First Largish Program We Tried It On

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 45 This Does Not work When The Problem is Easy

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 46 Looking at the Differences on the First Large Program

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 47 Looking on the Differences at the Second Large Program

IBM Labs in Haifa, Software and Verification Technology Formal Verification and Testing Technologies © 2008 IBM Corporation 48 Healing Concurrent Programs  In testing we try to make them more likely to fail  In the field we may want to make them less likely to fail  General problem statement – the program sometimes fail (depends on the interleaving)  General solution – eliminate, or make the interleavign the lead to bugs less likely; but do not cause deadlocks  We do it for races, deadlocks …  Not quite as useful but very interesting  Done as a project called SHADOWS in the EU