1 Towards Automated Verification Through Type Discovery Joint work with Rahul Agarwal Scott D. Stoller State University of New York at Stony Brook
2 Automated Verification of Infinite-State Systems To make it feasible, we must restrict: –the system Examples: push-down systems, infinite chains of finite-state automata –or the properties (and the system slightly) Example: variables are initialized before they are used We restrict the properties.
3 Verification With Type Systems Many properties can be checked with types. Properties of sequential programs: –Operations applied to appropriate arguments –Correct calling sequence of procedures in API Example: open file before reading or writing it –Encapsulation of objects Example: links are encapsulated in linked list –Information flow
4 Verification With Type Systems (continued) Properties of concurrent programs –Race-freedom (= absence of race conditions) –Deadlock-freedom –Atomicity Called isolation or serializability in databases Properties of distributed programs –Correctness of authentication protocols (Cryptyc)
5 Why Are Type Systems Attractive? The concept of types is familiar to programmers Extended types can be embedded in comments. Types support compositional verification –Type-checking a method depends on types (not code) of other methods Types provide clean separation of “guessing” and checking –Inference algorithms, heuristics, hints from user can be used freely to “guess” types –Only need to show soundness of type checker
6 Disadvantages of Type Systems Not all properties can be checked with types –But several useful properties can be checked Complete static type inference is infeasible (NP-complete or worse) for many interesting type systems Annotating new code with types takes time Annotating legacy code with types takes a long time –Developer first needs to understand the code
7 Type Discovery Type discovery: guess (and then check) types for a program based on information from run-time monitoring Is type discovery guaranteed to be effective? –Of course not, if type inference is provably hard. –Type discovery must rely on heuristics to generalize concrete relationships to static (syntactic) relationships. Why is type discovery likely to be effective? –Assuming static intra-procedural type inference is feasible, monitored executions do not need to achieve high statement coverage.
8 Related Work: Invariant Discovery Invariant discovery in Daikon [Ernst et al.] –Daikon considers a set of candidate predicates defined by a grammar, with a limit on the size of the predicates. –Daikon inserts instrumentation at a designated program point to discover which of these predicates hold at that program point. –Type discovery is “harder”: a single type annotation may depend on what happens at many program points Example: With race-free types, an annotation on a field declaration depends on which locks are held at every point at which the field is accessed.
9 Outline Type Discovery for Verification of Race-Freedom –Background on race conditions –Related work on analysis of race conditions –Overview of type system for race-freedom –Type discovery algorithm –Experimental results Sketch of Type Discovery for Verification of: –Atomicity –Deadlock-freedom –Safe region-based memory management
10 Race Conditions A race condition occurs when two threads access a shared variable and: –At least one access is a write –No synchronization is used to prevent the accesses from being simultaneous. Race conditions indicate that the program may produce different results if the schedule (order of operations) changes. –In many systems, the thread scheduler is loosely specified, so the program is effectively non- deterministic. Race conditions often reflect synchronization errors.
11 Race Condition: Example class Account { int balance =0; void deposit(int x) { this.balance = this.balance + x; } Account a = new Account(); fork {a.deposit(10);} A deposit is lost if both threads read this.balance and then both threads update this.balance. This reflects a race condition on this.balance. Making deposit(int) synchronized eliminates the race condition and the error.
12 Approaches to Detecting Race Conditions Run-time monitoring Pioneering work: Eraser [Savage+, 1997] + automatic -no guarantees about other executions Static analysis –RacerX [Engler+, 2003] effectively finds some race conditions but relies on unsound heuristics. Type systems Race Free Java, PRFJ, Multithreaded Cyclone + well-typed programs are guaranteed race free. - requires manual annotations (greatly reduced by type discovery)
13 Parameterized Race Free Java (PRFJ) [Boyapati & Rinard, OOPSLA 2001] Each object is associated with an owner and a root owner Owner is normally an object indicated by a final expression or self. Lock on root owner must be held when object is accessed. Example: owner(x)=self, owner(y)=x y:Link x:LinkedList
14 Parameterized Race Free Java (continued) In some special cases, race conditions are avoided without locks. Special owner values indicate these special cases: –thisThread : object is unshared –unique : unique reference to the object –readonly : object cannot be updated Owner may change in ways that do not cause races, specifically, from unique to any other owner Unique references are transferred with this syntax: y = x-- ; // equivalent to: y =x; x = null;
15 Annotations in PRFJ Classes are annotated with one or more owner parameters. –First parameter specifies the owner of this object. –Remaining parameters (if any) specify owners of fields, method parameters, return values, etc.
16 Example PRFJ program class Account { int balance ; public Account(int balance) { this.balance = balance;} void deposit(int x) requires this { this.balance = this.balance + x; } Account a1 = new Account (0); a1.deposit(10); Account a2 = new Account fork {synchronized (a2) {a2.deposit(10);}}
17 Annotations in PRFJ Owner parameters are instantiated at uses of class names. Classes are annotated with one or more owner parameters. –First parameter specifies the owner of this object. –Remaining parameters (if any) specify owners of fields, method parameters, return values, etc.
18 Example PRFJ program class Account { int balance ; public Account(int balance) { this.balance =balance;} void deposit(int x) requires this { this.balance = this.balance +x; } Account a1 = new Account (0); a1.deposit(10); Account a2 = new Account fork {synchronized (a2) {a2.deposit(10);}}
19 Annotations in PRFJ Methods are annotated with requires l1,l2,... clause. –Locks of rootowners of l1,l2,… should be held at all call sites. Owner parameters are instantiated at uses of class names. Classes are annotated with one or more owner parameters. –First parameter specifies the owner of this object. –Remaining parameters (if any) specify owners of fields, method parameters, return values, etc.
20 Example PRFJ program class Account { int balance ; public Account(int balance) { this.balance =balance;} void deposit(int x) requires this { this.balance = this.balance + x; } Account a1 = new Account (0); a1.deposit(10); Account a2 = new Account fork {synchronized (a2) {a2.deposit(10);}}
21 Example PRFJ program class Account { int balance ; public Account(int balance) { this.balance =balance;} void deposit(int x) requires this { this.balance = this.balance + x; } Account a1 = new Account (0); a1.deposit(10); Account a2 = new Account fork {synchronized (a2) {a2.deposit(10);}}
22 Example PRFJ program class Account { int balance ; public Account(int balance) { this.balance =balance;} void deposit(int x) requires this { this.balance = this.balance + x; } Account a1 = new Account (0); a1.deposit(10); Account a2 = new Account fork {synchronized (a2) {a2.deposit(10);}}
23 The cost of PRFJ About 25 annotations/KLOC, in Boyapati & Rinard’s experiments with PRFJ
24 Towards Type Discovery for PRFJ Type systems like PRFJ seem to be a promising practical approach to verification of race freedom, if more annotations can be obtained automatically. Static type inference for (P)RFJ is NP-complete [Flanagan & Freund, 2004] Type discovery for PRFJ builds on work on run-time race detection.
25 Run-time Race Detection: The Lockset Algorithm [Savage et al., 1997] The lockset algorithm detects violations of a simple locking discipline in monitored executions. Following the locking discipline implies race-freedom. Fully automatic No guarantee about other executions
26 Core Lockset Algorithm C(v) = set of locks that have protected variable v so far Initialization: C(v) := set of all locks On an access to v by thread t, C(v) := C(v) locks_held(t) If C(v) is empty, issue warning: locking discipline violated (potential for race conditions) Lockset Algorithm core lockset algorithm plus special treatment for initialization of variables and read-only variables.
27 Overview of Type Discovery for PRFJ 1.Identify unique references using static analysis. 2.Instrument the program using an automatic source-to- source transformation. 3.Execute the instrumented program, which writes information to a log file. 4.Analyze the log to discover: a.owners of fields, parameters, return values b.owners in class declarations c.values of non-first owner parameters d.requires clause for each method 5.Run intra-procedural type inference to get types for local variables. 6.Run the type checker.
28 Step 1: Static Analysis of Unique References We use a variant of a uniqueness analysis in [Aldrich, Kostadinov, & Chambers 2002]. Determine which parameters are lent, i.e., when the method returns, no new references to the argument exist. Determine which expressions are unique references, based on the lent annotations and known sources of unique references, namely, object allocation expressions. The analysis is flow-insensitive and context-insensitive.
29 Step 2: Instrumentation To help infer the owner of a field, method parameter, or return value x, we monitor a set S(x) of objects that are “values” of x. –If x is a field of class C, S(x) contains objects stored in that field of instances of C. –If x is a method parameter, S(x) contains arguments passed through that parameter. FE(x): set of final expressions that are syntactically legal at the declaration of x. These are candidate owners of x. –Final expressions are built from final variables (including this ), final fields, and static final fields
30 Step 2: Instrumentation (continued) lkSet(x,o): set of locks held at every access to o, excluding accesses through a unique reference. rdOnly(x,o): bool: whether a field of o was written shar(x,o): bool: whether o is shared val(x,o,e), where e in FE(x): value of e at an appropriate point for x and o: –If x is a field: immediately after constructor invocation that initialized o. –If x is a parameter to method m: immediately before calls to m where o is passed through parameter x. After an object o is added to S(x), every access to o is intercepted and the following information is updated.
31 Step 3: Execute the instrumented program The instrumented program writes information to a log file.
32 Step 4.a: Discover owners for fields, method parameters, and return values Note: The first matching rule wins. If Java type of x is an immutable class (e.g, String ), then owner(x) = readonly If ( o in S(x) : !shar(x,o)), then owner(x) = thisThread If ( o in S(x): rdOnly(x,o)), then owner(x) = readonly If ( o in S(x): o in lkSet(x,o)), then owner(x) = self
33 Step 4.a: Discover owners for fields, method parameters, and return values (continued) If for some e in FE(x), ( o in S(x): val(x,o,e) in lkSet(x,o)), then owner(x) = e Otherwise, owner(x)= thisOwner, where thisOwner is the first owner parameter of the class.
34 Example: owner of this param of MyThread(..) public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l = new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); }
35 Example: owner of this param of MyThread(..) public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l = new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); }
36 Example: owner of l field and parameter public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l = new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); } Lockset Table Objlkset l l
37 Step 4.a: Discover owners for fields, method parameters, and return values If Java type of x is an immutable class (e.g, String ), then owner(x) = readonly. If ( o in S(x) : !shar(x,o)), then owner(x) = thisThread. If ( o in S(x): rdOnly(x,o)), then owner(x) = readonly If ( o in S(x): o in lkSet(x,o)), then owner(x) = self …
38 Example: owner of l field and parameter public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l = new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); } Lockset Table objlkset ll
39 Step 4.b: Discover owners in class declarations Monitor a set S(C) of instances of class C. If ( o in S(C): !shar(x,o)), owner(C)= thisThread If ( o in S(C): o in lkSet(x,o)), then owner(C)= self Otherwise owner(C) = thisOwner Use owner(C) as the first owner parameter in the declaration of C.
40 Example: owner of class MyThread public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l = new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); }
41 Step 4.b: Discover owners in class declarations Monitor a set S(C) of instances of class C. If ( o in S(C): !shar(x,o)), owner(C)= thisThread If ( o in S(C): o in lkSet(x,o)), then owner(C)= self Otherwise owner(C) = thisOwner
42 Example: owner of class MyThread public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l = new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); }
43 Step 4.c: Discover non-first owner parameters Assume uses of these parameters in class declaration are given. Example: class ArrayList { public boolean add(Object o){…} … } If the owner parameter is used as the owner of a method parameter (like eltOwner ), instantiate it based on discovered owner of the method parameter. Similar technique is used if the owner parameter is used as the owner of a field.
44 Example public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l =new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); }
45 Step 4.c: Discover non-first owner parameters If Java type of x is an immutable class (e.g, String ), then owner(x) = readonly. If ( o in S(x) : !shar(x,o)), then owner(x) = thisThread. If ( o in S(x): rdOnly(x,o)), then owner(x) = readonly If ( o in S(x): o in lkSet(x,o)), then owner(x) = self …
46 Example public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l =new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); }
47 Step 4.d: Discover requires clause run methods are given an empty requires clause. Each method declared in class with owner thisThread (from Step 4.b) is given an empty requires clause. For other classes, the requires clause contains all method parameters p (including the implicit this parameter) such that the method contains a field access p.f outside the scope of a synchronized( p ) statement.
48 Step 5: Intra-procedural type inference a. Introduce fresh distinct formal owner parameters for unknown owners in variable declarations and object allocation expressions. b. Derive equality constraints between owners from assignment statements and method invocations. c. Solve the constraints in almost linear time using the standard union-find algorithm Test suite does not need full (or even high) statement coverage because intra-procedureal type inference propagates owner information into unexecuted parts of the code.
49 Example: owner of local variable l public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l =new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); }
50 Example public class MyThread extends Thread { public ArrayList l; public MyThread(ArrayList l) { this.l = l; } public void run() { synchronized(this.l) { l.add(new Integer (10)); } } public static void main(String args[]) { ArrayList l =new ArrayList (); MyThread m1 = new MyThread (l); MyThread m2 = new MyThread (l); m1--.start(); m2--.start(); }
51 Implementation Source-to-source transformation implemented in front-end of Kopi compiler. For efficiency, monitor selected objects only. –For a field x with type C, monitor at most one object per allocation site of C. –For a method parameter or return type x, monitor at most one object per call site of that method. –Only record values of final expressions of the form this and this. f, where f is a final field.
52 Experience with Type Discovery for PRFJ Evaluated on 5 multi-threaded server programs from Boyapati & Rinard. Total of about 1600 lines of code Original PRFJ code contains 70 annotations Our system discovers correct types for completely unannotated application code, after one simple execution of each program. User only needs to annotate Java API classes with multiple parameters Run-time overhead is about 20% Now we are tackling Jigsaw, a large web server.
53 Outline Type Discovery for Verification of Race-Freedom –Background on race conditions –Related work on analysis of race conditions –Overview of type system for race-freedom –Type discovery algorithm –Experimental results Sketch of Type Discovery for Verification of: –Atomicity –Deadlock-freedom –Safe region-based memory management
54 Atomicity Atomicity corresponds to serializability or isolation in database transactions. A method m is atomic if every execution of the program is equivalent to an execution in which the events in each invocation of m occur contiguously. Atomicity is a common requirement. How can we check it automatically? original serial : part of an invocation of m: other event
55 Type Discovery for Atomicity Types Flanagan & Qadeer proposed a type system for verifying atomicity, based on ideas in type system for race-freedom and commutativity properties: –Lock acquire operations are right-movers, i.e., commute to the right past any event of another thread. –Lock release operations are left-movers. –Race-free accesses to variables are both-movers, i.e., right-movers and left-movers. They annotated about 7 KLOC in Sun’s Java standard library, using 23 annotations/KLOC. They successfully verified atomicity of many methods and found a few atomicity violations (errors).
56 Type Discovery for Atomicity Types Type discovery for atomicity types can be built on type discovery for race-free types. –No additional run-time instrumentation is needed. We are implementing a type checker for a variant of their type system and designing a type discovery algorithm for it.
57 Deadlock Deadlock occurs if there is a cycle in the waits-for graph, which has –a node for each thread –an edge if thread i is waiting to acquire a lock held by thread j We don’t consider other causes of deadlocks, such as lost notifies. Lock ordering is a classic deadlock prevention strategy: –Define a partial order on locks –A thread holding a lock L may attempt to acquire only locks that are larger than L in the partial order.
58 Types for Deadlock-Freedom Boyapati & Rinard extended PRFJ to verify deadlock- freedom. The type annotations –define a set of lock levels –define an ordering on lock levels –assign a level to each expression used as a lock The typing rules ensure that locks are acquired in an order consistent with the ordering on lock levels. Lock-level parameters allow lock-level polymorphism: a piece of code can be used with locks of different levels.
59 Type Discovery for Deadlock-Freedom Instrument the program to discover an ordering on locks [Havelund '00]: L1 < L2 if L2 is acquired while L1 is held If the ordering contains cycles, the program is untypable. Introduce a new lock level and assign all currently unassigned <-minimal locks to that level. Repeat until all locks have been assigned a level. The order on lock levels is the order they were introduced. Assign a level to each field, method parameter, and method return potentially used as a lock (i.e., owner= self ) based on its recorded values.
60 Type Discovery for Deadlock-Freedom If a class has owner self and instances of it appear in multiple lock levels, then introduce a formal level parameter in the declaration of the class. Use intra-procedural type inference (like in PRFJ) to propagate lock levels from fields, method parameters, and method return types throughout the body of each method.
61 Region-Based Memory Management It offers some of the safety and convenience of garbage collection while avoiding much of the overhead. Real-Time Specification for Java (RTSJ) supports it. region = an area of memory whose lifetime is tied to an execution scope (e.g., method invocation) or scopes (in concurrent programs). While executing in the scope associated with a region, threads can allocate objects in the region. When all threads have exited the scope corresponding to a region, the entire region (and hence all objects in it) is automatically de-allocated.
62 Safe Region-Based Memory Management To ensure that de-allocation of a scope does not create dangling references, an object in one region may not contain a reference to an object in a shorter-lived region. In RTSJ, attempting to create such a reference causes an IllegalAssignmentError exception.
63 Types for Safe Region-Based Memory Management [Boyapati, Salcianu, Beebee, & Rinard 2003] describes a type system that verifies absence of this error. The type system is similar in structure to the PRFJ type system: every object has an owner, indicated in its type. But the semantics is different. Here, an object's owner indicates where the object is allocated (e.g., which region). Ownership Diagram owners
64 Types for Safe Region-Based Memory Management In the type system, owners are represented by: expressions of type ScopedMemory. In RTSJ, instances of ScopedMemory represent regions (like instances of Socket represent sockets). formal owner parameters, as in PRFJ. : ScopedMemory Ownership Diagram owners,
65 Types for Safe Region-Based Memory Management The types express: –ownership of objects –the outlives relation between regions, based on the associated scopes. The typing rule for a field update o.f=p requires that p's owner is a region that outlives o's owner. Ownership Diagram outlives o p f
66 Type Discovery for Safe Region-Based Memory Management About 23 annotations/KLOC were needed in Boyapati et al.’s experiments with several programs. Claim: Type discovery can significantly reduce the number of annotations, using similar techniques as for PRFJ. At method entry point, record the following relationships, where x,y are parameters (including this ) or fields: –x refers to a ScopedMemory object, and y refers to an object allocated in the associated region –x and y refer to objects allocated in the same region Instrument method exit points similarly. Generate candidate annotations based on these relationships.
67 Summary and Future Work Initial experiments suggest that type discovery is: – an effective approach to automated verification of some properties that can be expressed with types. –relatively insensitive to the choice of test inputs. –effective even without high code coverage in the monitored executions We are applying type discovery to larger programs and more type systems. –Stay tuned! We expect to have more experimental results soon.