Designing Programming Languages to Improve Software Quality David J. Pearce Software Quality New Zealand, August
Null Pointers “I call it my billion-dollar mistake. It was the invention of the null reference in 1965” -- Tony Hoare “Of all the exceptions a Java programmer might encounter, the null-pointer exception is among the most dreaded, and for good reason: it is one of the least informative exceptions that a program can signal.” -- Eric Allen
(from java/lang/String.java) Experiences with Null Pointers /** * … NullPointerException if sb is * null */ boolean contentEquals(StringBuffer sb) { synchronized(sb) { … } }
(also from java/lang/String.java) /** * Tests if this string starts with the * specified prefix. * prefix the prefix. true if the character sequence… */ public boolean startsWith(String prefix) { … char pa[] = prefix.value; … }
/** * Tests if this string starts with the * specified prefix. * prefix the prefix. true if the character sequence… */ public boolean startsWith(String prefix) { … char pa[] = prefix.value; … } Found 83/1101 java.lang methods were misdocumented!
Thoughts Why is documentation bad? –Because programmers write it –Programmers have deadlines –Documentation doesn’t help meet deadlines When is documentation not bad? –Think about types in Java –Method types automatically documented! –Compiler ensures they are accurate
@NonNull Types void Integer x) { x.toString(); // safe! } void g(Integer x) { x.toString(); // Syntax Error } void h(Integer x) { if(x != null) { x.toString(); // Safe! }
JML Specifications public class IntVec { private int data[]; private int length = 0; invariant data != null; invariant length >= 0; invariant length <= data.length; requires size > 0; public IntVec(int size) { data = new int[size]; } ensures \result >= 0; public int size() { return length; } …
More JML Specifications public requires s1 != null && s2 != requires s2.length==0 && s1.length==0; ensures !\result; ensures \result == (\exists int i; 0<=i && i<s1.length && s1[i]<s2[i] && (s1.length<s2.length && public static pure model boolean lessThan(char[] s1, char[] s2); (from JML spec of java/lang/String.java)
So … ? PROBLEM: modern languages make compile-time verification unnecessarily hard… ANSWER: design a language to simplify verification: –Pure Functions vs Impure Methods –Value Semantics –Structural subtyping & Flow Typing –Unbound Integers and Rationals –First-class Collections (sets, lists, maps) –Strict Concurrency Model (e.g. non-reentrancy)
Whiley
Overview What is Whiley? –Hybrid functional / imperative language –Designed specifically for verification –Compiles to JVM (also prototype C backend) Why another language? –Verification is really hard –Many features of Java it even harder! –I think it’s basically impossible for Java –See ESC/Java and JML as good efforts here
A Zoo of Unusual Types! Primitives: –e.g. Collections (lists, sets, maps): –e.g. Records and Tuples: –e.g. Unions and Intersections: –e.g. Negations –e.g. [int]{string}{int=>string} !int anyintrealcharnullbool{int x, int y}(int,int)int|nullint&null
Flow Typing
A flow-sensitive approach to type checking Types declared only for parameters and returns Variables can have different types! Conditionals and/or assignments cause retyping int sum([int] items): r = 0 for item in items: r = r + item return r
Flow Typing Type tests automatically retype variables! –(even on the false branch) define Circle as {int x, int y, int r} define Rect as {int x, int y, int w, int h} define Shape as Circle | Rect real area(Shape s): if s is Circle: return PI * s.r * s.r else: return s.w * s.h
Flow Typing & Unions Cannot treat null|int like an int Must distinguish cases by explicit type testing null|int indexOf(string str, char c): … [string] split(string str, char c): idx = indexOf(str,c) if idx is int: below = str[0..idx] above = str[idx..] return [below,above] else: return [str] Can safely treat x as int here
Verification
Function f() : –Accepts an arbitrary integer … –Should return a natural number … –But, this implementation is broken! define nat as int where $ >= 0 nat f(int x): return x A compile time error!
define nat as int where $ >= 0 nat f(int x): if x >= 0: return x else: return 0 Function f() : –Accepts an arbitrary integer … –Returns a natural number … –This implementation satisfies the spec! OK, because x implicitly a nat
Verification Function g() : –Accepts a positive number … –And returns a natural number … –But, how to know pos subtypes nat ? define nat as int where $ >= 0 define pos as int where $ > 0 nat g(pos x): return x OK, because pos implies nat
Verification Function h() : –Accepts a natural number … –And returns a positive number … –But, how to know nat+1 gives pos ? define nat as int where $ >= 0 define pos as int where $ > 0 pos h(nat x): return x + 1 OK, because nat+1 gives pos
Verification Function h1() and h2() are identical define nat as int where $ >= 0 define pos as int where $ > 0 pos h1(nat x): return x + 1 int h2(int x) requires x>=0, ensures $>0: return x + 1
Verification Function sum() : –Accepts a list of natural numbers … –Then adds them together … –And returns a natural number. define nat as int where $ >= 0 nat sum([nat] list): r = 0 for x in list where r >= 0: r = r + x return r Ok, because adding nat to nat gives nat
Implementation
Compiler Overview Developed 2009 – present 304 classes (80KLOC) with 934 end-end tests Also, 133 verifier tests and type system unit tests
Performance
Eclipse Plugin Update Site: