Learning Universally Quantified Invariants of Linear Data Structures Pranav Garg 1, Christof Loding, 2 P. Madhusudan 1 and Daniel Neider 2 1 University.

1 Learning Universally Quantified Invariants of Linear Data Structures Pranav Garg 1, Christof Loding, 2 P. Madhusudan 1 and Daniel Neider 2 1 University of Illinois at Urbana-Champaign 2 RWTH Aachen, Germany

2  Renewed interest in application of learning to synthesizing invariants [Sharma et al. CAV-12], [Sharma et al. SAS-13], [Kong et al. APLAS-10] Black-box learning of invariants:  Advantages with respect to white-box techniques: - verification of complex program with simple invariants - generalization - apply extremely scalable Machine Learning algorithms for verification. 2 Black-box learning of invariants check Hypothesis? Program Learner Teacher H (hypothesis)

3  Active learning: - learner queries teacher with equivalence and membership queries  Passive learning: - given a sample = (examples, counter-examples), learn the simplest concept 3 Active Learning and Passive Learning Teacher Active Learner membership/ equivalence yes/no Learner Sample S

4  Build active learning algorithms for learning quantified formulas over linear data structures (arrays/lists). - introduce Quantified Data Automata  normal form for such invariants. - build active learning algorithm for QDAs.  Build passive learning algorithm using active learning algorithm. - based on an imprecise teacher that answers questions wrt the samples.  Introduce elastic QDAs (EQDAs) that translate to decidable logics. - develop learning algorithms for EQDAs. 4 Overview 5 789 head  List pointed to by head is sorted

5 Program Configuration/Data words 5 893 2 head 4 7 i Program configuration: Data word:

6 Quantified Data Automata  QDAs represent universally quantified properties of linear data structures. 6 Example: head y1y1 y2y2 data(y 1 ) <= data(y 2 )

7 Quantified Data Automata Fix P – program pointer variables Fix Y – set of quantified variables Fix F – numerical abstract domain over data formulas  QDA over linear data structures: - reads a data word annotated with pointers P and Y - checks whether data stored at these positions satisfy a data property  QDA accepts a data word w with pointers P if it accepts all possible extensions of w with valuations for Y. 7 head y1y1 y2y2 data(y 1 ) <= data(y 2 )

8 Valuation words  Valuation word = data word over P + valuation for Y 8 Data word Valuation words Universal Quantification QDA accepts a data word iff it accepts ALL corresponding valuation words. 893 2 head 4 7 i 893 2 head, y1 4 7 i, y2 893 2 head 4 7 i, y2y1

9 Quantified Data Automata  Deterministic, finite, register automata over words - each state labeled with a data formula f  For a valuation word, QDA reads ptr. and univ. vars. and stores the data values in the register reg.  At the final state, QDA checks if these data values satisfy the formula labeling the state. - reg satisfies f(q) Accepts the valuation word - reg does not satisfy f(q) Rejects the valuation word 9 head  2 y 1  4 i  8 y 2  8 reg: f(q) = data(y 1 ) <= data(y 2 ) 893 2 head 4 7 i, y 2 y1y1 893 2 head 4 7 i, y 2 y1y1

10  QDAs are finite automata which output data formulas.  Lift Angluin’s L* algorithm for learning DFAs to learn QDAs.  Given a teacher, the unique minimal QDA can be learned in time polynomial in the size of this minimal QDA. 10 Learning QDAs head y1y1 y2y2 data(y 1 ) <= data(y 2 ) Regular expressionoutputs data(y 1 ) <= data(y 2 )

11 11 Elastic Quantified Data Automata (EQDA)  Subclass of QDAs which translate to decidable logics - Array Property Fragment (APF) [Bradley et al. VMCAI-06] - decidable fragment of Strand over lists [Madhusudan et al. POPL-11]  Cannot test whether two universal vars. are a bounded distance away. Restriction for EQDAs: All transitions on blank symbols (no ptr./univ. var) must be self-loops outside APF inside APF y1y1 y2y2 y1y1 y2y2 QDAEQDA

12 12 Elastic Quantified Data Automata (EQDA) Unique minimal over-approximation theorem: A QDA A can be uniquely minimally over-approximated by a language of valuation words that is accepted by an EQDA A el  The construction of A el given QDA A is called elastification.  Learning EQDAs <= learning QDAs + elastification. A A el B el C el

13 Passively learning QDAs Given the samples S + and S -, the teacher uses them to answer the active learner. The teacher wants the active learner to construct a QDA that includes S + and excludes S -.  Membership query: - if s belongs to S +, return yes - if s belongs to S -, return no - otherwise, return no (errs on keeping the learned concept semantically small)  Equivalence query: - checks if conjectured invariant is consistent with S + and S - The learned QDA might be non-optimal (usually small). Running time is polynomial in the size of the learned QDA. 13 Teacher Sample S +, S - Active Learner Passive Learner

14 14 Experiments  Run the program on arrays/lists of small bounded sizes, with data values from a bounded data-domain, eg. {0, 1, 2}, etc.  Extract the concrete data-structures that get manifest at loop headers.  Obtain the set S + on which passive learning is performed. - fix F to the cartesian lattice of atomic formulas over relations {=, <, ≤} Learn QDAs using Angluin’s algorithm - The learner never asks long membership queries - The teacher, thus, often has correct answers. The learned QDA is over-approximated to an elastic QDA to get a quantified invariant over decidable Strand or APF.

20 Related Work  Daikon [Ernst et al. ICSE-00] - conjunctive Boolean learning - learns quantified invariants over arrays, to some extent.  Applications of learning in verification - rely-guarantee contracts [Cobleigh et al. TACAS-03, Alur et al. CAV-05] - stateful interfaces [Alur et al. POPL-05] - learning quantified invariants over predicates [Kong et al. APLAS-10]  Machine learning algorithms for invariant synthesis [Sharma et al. CAV-12, SAS-13, ESOP-13] 20

21 Conclusion  Learning universally quantified invariants over linear data structures - Quantified Data Automata (QDA) / elastic QDAs - Active learning for QDAs - Unique elastification - Algorithm for passive learning QDAs/EQDAs. - Experimental validation Future Work:  Extensions to trees to capture universally quantified properties like binary-search-tree, max-heap, …  Combining automata based structural learning with machine learning algorithms for learning data formulas 21 Thank You !

