Grigore Rosu Mahesh Viswanathan

Slides:



Advertisements
Similar presentations
Comparative Succinctness of KR Formalisms Paolo Liberatore.
Advertisements

A Survey of Runtime Verification Jonathan Amir 2004.
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
Lecture 24 MAS 714 Hartmut Klauck
- Vasvi Kakkad.  Formal -  Tool for mathematical analysis of language  Method for precisely designing language  Well formed model for describing and.
The complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan Univ of Illinois at Urbana Champaign.
Runtime Verification Ali Akkaya Boğaziçi University.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
RV: A Runtime Verification Framework for Monitoring, Prediction and Mining Patrick Meredith Grigore Rosu University of Illinois at Urbana-Champaign (UIUC)
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Closure Properties of CFL's
Having Proofs for Incorrectness
Formal Language, chapter 9, slide 1Copyright © 2007 by Adam Webber Chapter Nine: Advanced Topics in Regular Languages.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture4: Non Regular Languages Prof. Amos Israeli.
CS5371 Theory of Computation Lecture 12: Computability III (Decidable Languages relating to DFA, NFA, and CFG)
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Static and Dynamic Analysis at JPL Klaus Havelund.
Regular Expressions. Notation to specify a language –Declarative –Sort of like a programming language. Fundamental in some languages like perl and applications.
1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 5 School of Innovation, Design and Engineering Mälardalen University 2012.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
Lexical Analysis Constructing a Scanner from Regular Expressions.
1 Monitoring Extended Regular Expressions Grigore Rosu University of Illinois at Urbana-Champaign, USA Joint work with Mahesh Viswanathan and Koushik Sen.
Checking Reachability using Matching Logic Grigore Rosu and Andrei Stefanescu University of Illinois, USA.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
From Hoare Logic to Matching Logic Reachability Grigore Rosu and Andrei Stefanescu University of Illinois, USA.
All-Path Reachability Logic Andrei Stefanescu 1, Stefan Ciobaca 2, Radu Mereuta 1,2, Brandon Moore 1, Traian Serbanuta 3, Grigore Rosu 1 1 University of.
Certifying and Synthesizing Membership Equational Proofs Patrick Lincoln (SRI) joint work with Steven Eker (SRI), Jose Meseguer (Urbana) and Grigore Rosu.
Week 13 - Friday.  What did we talk about last time?  Regular expressions.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Orna Kupferman Yoad Lustig
CSE 105 theory of computation
Sequential Flexibility
Formal Language & Automata Theory
Standard Representations of Regular Languages
Matching Logic An Alternative to Hoare/Floyd Logic
Regular Expressions.
Behavioral Rewrite Systems and Behavioral Productivity
(One-Path) Reachability Logic
runtime verification Brief Overview Grigore Rosu
State your reasons or how to keep proofs while optimizing code
Efficient Decentralized Monitoring of Safety in Distributed Systems
Koushik Sen Abhay Vardhan Gul Agha Grigore Rosu
Department of Software & Media Technology
CS 154, Lecture 6: Communication Complexity
Monitoring Programs using Rewriting
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Multi-Way Search Trees
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
4. Properties of Regular Languages
Generating Optimal Linear Temporal Logic Monitors by Coinduction
Non Deterministic Automata
Chapter Nine: Advanced Topics in Regular Languages
Introduction to Finite Automata
Elementary Questions about Regular Languages
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Dongyun Jin, Patrick Meredith, Dennis Griffith, Grigore Rosu
Finite-Trace Linear Temporal Logic: Coinductive Completeness
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Instructor: Aaron Roth
Runtime Safety Analysis of Multithreaded Programs
Instructor: Aaron Roth
Instructor: Aaron Roth
Presentation transcript:

Testing Extended Regular Language Membership Incrementally by Rewriting Grigore Rosu Mahesh Viswanathan University of Illinois at Urbana-Champaign, USA

Increasing Software Reliability Current solutions Human review of code and testing Most used in practice Usually ad-hoc, intensive human support (Advanced) Static analysis Often scales up False positives and negatives, annotations (Traditional) Formal methods Model checking and theorem proving General, good confidence, do not always scale up

Runtime Verification and Monitoring Idea: Let system run and observe execution trace. If that violates or appears to violate requirements then report error or guide the program to avoid or to hit error.

Runtime Verification and Monitoring PathExplorer – developed jointly with Havelund Used on 70,000 lines of C++ code (K9 Rover) Found a deadlock in ~10 seconds Confirmed a datarace suspicion Runtime Verification Workshop ‘01 –France (CAV), ‘02 –Denmark (CAV), ’03 –USA (CAV) ’04 –Spain (ETAPS), …

PathExplorer - Overview Observer Events Running program (socket) (Joint work with Klaus Havelund of NASA Ames)

PathExplorer – the Observer paxmodules module datarace =‘java pax.Datarace’; module deadlock =‘java pax.Deadlock’; module temporal =‘java pax.Temporal spec’; module ERE =‘java pax.Ere spec’; end Predictive Analisis datarace warning … deadlock warning … Dispatcher Event stream Specification Based Monitoring warning … temporal ERE warning …

Why (Extended) Regular Expressions? Ordinary programmers and software engineers understand and use regular expressions Perl, Python, etc. Safety policies are often regular patterns on sequences of states/events: (idle* open (read + write)* close)* Complementation needed: to say what should not happen: ¬ (any* start1 (¬ end1)* start2 any*)

Extended Regular Expressions (ERE) Regular expressions with complement Language of an ERE Intersection R ∩ R’ := ¬(¬R + ¬R’) R ::= Φ | ε | A   | R + R | R · R | R* | ¬R L(Φ) = Φ L(R + R’) = L(R)  L(R’) L(ε) = {ε} L(R · R’) = {ww’ | w L(R), w’ L(R’)} L(A) = {A} L(R*) = (L(R))* L(¬R) = * \ L(R)

ERE Membership Problem Given w * and R, is it the case that w  L(R)? Patterns in strings; many applications Programming languages (PERL, Python) Molecular biology (Knight-Myers95) Monitoring Efficient solutions are of great practical interest From now on, n is the length of the word/trace w and m is the size of the ERE R n is typically much much larger than m

What is known (I) If R does not contain negations, then Transform R into an NFA of size O(m) (Aho’90) Solution in time O(nm) and space O(m) Improved by Mayers’92 (JACM): time/space O(nm / log n) Transform R into a DFA of size O(2m) (Aho’90) Solution in time O(nm) and space O(2m) Note: transitions in a DFA take logarithmic time Negations and their nesting make the membership problem highly non-trivial

Problems with Negation (I) How to complement an NFA? Just complementing the set of final states is wrong! a b a b A A’ L(A) = {ab} L(A’) = {ab,a, ε}

Problems with Negation (II) DFAs can be complemented safely by just complementing the set of final states, but NFA -> DFA implies exponential state blowup! For k nested negations, 2^(2^(…(2^m)…)) states This makes the membership problem non-elementary more complex in the context of (nested) negations k

What is known (II) Dynamic programming algorithm (Hopcroft-Ullman ’79) Time O(n3m) and space O(n2m) Special synchronized alternating automata (Yamamoto ’02) – intersection but not negation (Kupferman-Zuhovitzky ’02) – general ERE Time O(n2m) and space O(nm+kn2), where k is the number of negations and intersections Algorithms above store the word; this is unacceptable in many practical situations

Desired Behavior - Monitoring Algorithms processing and then discarding each event are desired in practice, since words or execution traces can be extremely long Observer Events Running program socket

Challenges and Talk Overview What is the lower space/time bound of the ERE monitoring problem (to process one event)? (2cm½ ) for space What is a reasonable upper bound for the ERE monitoring problem (to process one event)? Rewriting algorithm in O(22m2) space/time

Lower Bound for ERE Monitoring (I) Consider the language (Chandra-Kozen-Stockmeyer81 in alternation) (Kupferman-Vardi98 in model checking) Lk = {u # w # u’ $ w | w {0,1}k and u,u’ {0,1,#}*} We show that There is an ERE Rk of size (k2) with L(Rk) = Lk Any monitoring algorithm for Lk needs (2k) space So we can conclude that the space lower bound for ERE monitoring is (2cm½)

Lower Bound for ERE Monitoring (II) Lk = {u # w # u’ $ w | w {0,1}k and u,u’ {0,1,#}*} (¬$)* $ (¬$)* ∩ ??? Rk = ??? There should be exactly one $ symbol, and … [(0+1)i 0 (0+1)k-i-1 # (0+1+#)* $ (0+1)i 0 (0+1)k-i-1 + (0+1)i 1 (0+1)k-i-1 # (0+1+#)* $ (0+1)i 1 (0+1)k-i-1] ∩ k i=0 Each letter in W should appear after $ at exactly the same position … There should be some sequence of 0,1,#, followed by a # and then by a W … (0+1+#)* # ??? Note that size of Rk is (k2) and L(Rk) = Lk

Lower Bound for ERE Monitoring (III) Lk = {u # w # u’ $ w | w {0,1}k and u,u’ {0,1,#}*} Let A be a monitor for Lk When A reads symbol $, it should “remember” exactly those w that have been seen so far There are 22k possible distinct situations to remember; so at least 2k memory needed by A to encode each of these situations

Idea of an Event-Consuming Algorithm “Consume” each event as it arrives, generating a new ERE monitoring requirement Use the notion of derivative R{a} is the ERE that should hold after seeing event a, in order for R to hold now Algorithm A stores an ERE R, and when an event a arrives it replaces R by R{a} at the end of trace A checks whether εR How can we generate R{a} efficiently? How can we store R{a} compactly?

ERE Syntax Sorts Ere and Event; subsort Event < Ere Operations _+_ : Ere Ere -> Ere[assoc comm id: empty] _ _ : Ere Ere -> Ere[assoc id: nil] _* : Ere -> Ere ¬_ : Ere -> Ere

Derivatives Related work: Antimirov and Mosses Obvious! Operations _{_} : Ere Event -> Ere _?_:_ : Bool Ere Ere -> Ere ε_ : Ere -> Bool Equations (R1 + R2){a} = R1{a} + R2{a} (R1 R2){a} = R1{a} R2 + (εR) ? R2{a} : Φ (R*){a} = R{a} R* (¬R){a} = ¬(R{a}) ε{a} = Φ Φ{a} = Φ b{a} = (b == a) ? ε : Φ Obvious!

Three Important Simplifying Rules Without any other rules, R{a1}{a2}…{an} can grow to unbounded size Simplifying rules Φ R = Φ R + R = R R1 R + R2 R = (R1 + R2) R Let R be the rewriting system defined so far

Theorems - 1 R terminates modulo AC of _+_ and A of _ _ φ(R{a}) = (φ(R) + 1)2 (linear ordering didn’t work) Problem for the termination competition? Tested using CiME (thanks to Xavier Urbain) R is ground Church-Rosser modulo AC of _+_ and A of _ _ Hard to show Non-linear TRS (R1 R + R2 R = (R1 + R2) R)

Theorems - 2 L(R{a}) = {w | aw  L(R)} for all EREs R a1a2…an  L(R) iff ε  R{a1}{a2}…{an} R{a1}{a2}…{an} requires O(22m2) space and O(n22m2) time, where m = |R| Hard proof Current proof in proceedings has a (little) error Can be fixed

Experiments and Conjectures Implemented algorithm above in Maude Generate all EREs R of size m and all possible evolutions R{a1}{a2}…{an} Encouraging results For |R|=12, we got |R{a1}…{an}| ≤ 108 Conjectures: The ERE monitoring rewriting algorithm runs in space O(2m) and in time O(n 2m) These are also the lower bounds for ERE membership

Conclusion and Future Work Exponential complexity unavoidable when negation is added to regular expressions (EREs) Few rewriting rules provide the best trace membership algorithm known for EREs! We have also generated minimal DFAs using the presented algorithm plus circular coinduction Algorithm shown to work in space O(22m2) but conjectured to run in O(2m) space Claim based on experimental results Proving conjecture can have a big impact!