Grigore Rosu Mahesh Viswanathan

Testing Extended Regular Language Membership Incrementally by Rewriting
Grigore Rosu Mahesh Viswanathan University of Illinois at Urbana-Champaign, USA

Increasing Software Reliability
Current solutions Human review of code and testing Most used in practice Usually ad-hoc, intensive human support (Advanced) Static analysis Often scales up False positives and negatives, annotations (Traditional) Formal methods Model checking and theorem proving General, good confidence, do not always scale up

Runtime Verification and Monitoring
Idea: Let system run and observe execution trace. If that violates or appears to violate requirements then report error or guide the program to avoid or to hit error.

Runtime Verification and Monitoring
PathExplorer – developed jointly with Havelund Used on 70,000 lines of C++ code (K9 Rover) Found a deadlock in ~10 seconds Confirmed a datarace suspicion Runtime Verification Workshop ‘01 –France (CAV), ‘02 –Denmark (CAV), ’03 –USA (CAV) ’04 –Spain (ETAPS), …

PathExplorer - Overview
Observer Events Running program (socket) (Joint work with Klaus Havelund of NASA Ames)

PathExplorer – the Observer
paxmodules module datarace =‘java pax.Datarace’; module deadlock =‘java pax.Deadlock’; module temporal =‘java pax.Temporal spec’; module ERE =‘java pax.Ere spec’; end Predictive Analisis datarace warning … deadlock warning … Dispatcher Event stream Specification Based Monitoring warning … temporal ERE warning …

Why (Extended) Regular Expressions?
Ordinary programmers and software engineers understand and use regular expressions Perl, Python, etc. Safety policies are often regular patterns on sequences of states/events: (idle* open (read + write)* close)* Complementation needed: to say what should not happen: ¬ (any* start1 (¬ end1)* start2 any*)

Extended Regular Expressions (ERE)
Regular expressions with complement Language of an ERE Intersection R ∩ R’ := ¬(¬R + ¬R’) R ::= Φ | ε | A   | R + R | R · R | R* | ¬R L(Φ) = Φ L(R + R’) = L(R)  L(R’) L(ε) = {ε} L(R · R’) = {ww’ | w L(R), w’ L(R’)} L(A) = {A} L(R*) = (L(R))* L(¬R) = * \ L(R)

ERE Membership Problem
Given w * and R, is it the case that w  L(R)? Patterns in strings; many applications Programming languages (PERL, Python) Molecular biology (Knight-Myers95) Monitoring Efficient solutions are of great practical interest From now on, n is the length of the word/trace w and m is the size of the ERE R n is typically much much larger than m

What is known (I) If R does not contain negations, then
Transform R into an NFA of size O(m) (Aho’90) Solution in time O(nm) and space O(m) Improved by Mayers’92 (JACM): time/space O(nm / log n) Transform R into a DFA of size O(2m) (Aho’90) Solution in time O(nm) and space O(2m) Note: transitions in a DFA take logarithmic time Negations and their nesting make the membership problem highly non-trivial

Problems with Negation (I)
How to complement an NFA? Just complementing the set of final states is wrong! a b a b A A’ L(A) = {ab} L(A’) = {ab,a, ε}

Problems with Negation (II)
DFAs can be complemented safely by just complementing the set of final states, but NFA -> DFA implies exponential state blowup! For k nested negations, 2^(2^(…(2^m)…)) states This makes the membership problem non-elementary more complex in the context of (nested) negations k

What is known (II) Dynamic programming algorithm
(Hopcroft-Ullman ’79) Time O(n3m) and space O(n2m) Special synchronized alternating automata (Yamamoto ’02) – intersection but not negation (Kupferman-Zuhovitzky ’02) – general ERE Time O(n2m) and space O(nm+kn2), where k is the number of negations and intersections Algorithms above store the word; this is unacceptable in many practical situations

Desired Behavior - Monitoring
Algorithms processing and then discarding each event are desired in practice, since words or execution traces can be extremely long Observer Events Running program socket

Challenges and Talk Overview
What is the lower space/time bound of the ERE monitoring problem (to process one event)? (2cm½ ) for space What is a reasonable upper bound for the ERE monitoring problem (to process one event)? Rewriting algorithm in O(22m2) space/time

Lower Bound for ERE Monitoring (I)
Consider the language (Chandra-Kozen-Stockmeyer81 in alternation) (Kupferman-Vardi98 in model checking) Lk = {u # w # u’ $ w | w {0,1}k and u,u’ {0,1,#}*} We show that There is an ERE Rk of size (k2) with L(Rk) = Lk Any monitoring algorithm for Lk needs (2k) space So we can conclude that the space lower bound for ERE monitoring is (2cm½)

Lower Bound for ERE Monitoring (II)
Lk = {u # w # u’ $ w | w {0,1}k and u,u’ {0,1,#}*} (¬$)* $ (¬$)* ∩ ??? Rk = ??? There should be exactly one $ symbol, and … [(0+1)i 0 (0+1)k-i-1 # (0+1+#)* $ (0+1)i 0 (0+1)k-i-1 + (0+1)i 1 (0+1)k-i-1 # (0+1+#)* $ (0+1)i 1 (0+1)k-i-1] ∩ k i=0 Each letter in W should appear after $ at exactly the same position … There should be some sequence of 0,1,#, followed by a # and then by a W … (0+1+#)* # ??? Note that size of Rk is (k2) and L(Rk) = Lk

Lower Bound for ERE Monitoring (III)
Lk = {u # w # u’ $ w | w {0,1}k and u,u’ {0,1,#}*} Let A be a monitor for Lk When A reads symbol $, it should “remember” exactly those w that have been seen so far There are 22k possible distinct situations to remember; so at least 2k memory needed by A to encode each of these situations

Idea of an Event-Consuming Algorithm
“Consume” each event as it arrives, generating a new ERE monitoring requirement Use the notion of derivative R{a} is the ERE that should hold after seeing event a, in order for R to hold now Algorithm A stores an ERE R, and when an event a arrives it replaces R by R{a} at the end of trace A checks whether εR How can we generate R{a} efficiently? How can we store R{a} compactly?

ERE Syntax Sorts Ere and Event; subsort Event < Ere Operations
_+_ : Ere Ere -> Ere[assoc comm id: empty] _ _ : Ere Ere -> Ere[assoc id: nil] _* : Ere -> Ere ¬_ : Ere -> Ere

Derivatives Related work: Antimirov and Mosses Obvious! Operations
_{_} : Ere Event -> Ere _?_:_ : Bool Ere Ere -> Ere ε_ : Ere -> Bool Equations (R1 + R2){a} = R1{a} + R2{a} (R1 R2){a} = R1{a} R2 + (εR) ? R2{a} : Φ (R*){a} = R{a} R* (¬R){a} = ¬(R{a}) ε{a} = Φ Φ{a} = Φ b{a} = (b == a) ? ε : Φ Obvious!

Three Important Simplifying Rules
Without any other rules, R{a1}{a2}…{an} can grow to unbounded size Simplifying rules Φ R = Φ R + R = R R1 R + R2 R = (R1 + R2) R Let R be the rewriting system defined so far

Theorems - 1 R terminates modulo AC of _+_ and A of _ _
φ(R{a}) = (φ(R) + 1)2 (linear ordering didn’t work) Problem for the termination competition? Tested using CiME (thanks to Xavier Urbain) R is ground Church-Rosser modulo AC of _+_ and A of _ _ Hard to show Non-linear TRS (R1 R + R2 R = (R1 + R2) R)

Theorems - 2 L(R{a}) = {w | aw  L(R)} for all EREs R
a1a2…an  L(R) iff ε  R{a1}{a2}…{an} R{a1}{a2}…{an} requires O(22m2) space and O(n22m2) time, where m = |R| Hard proof Current proof in proceedings has a (little) error Can be fixed

Experiments and Conjectures
Implemented algorithm above in Maude Generate all EREs R of size m and all possible evolutions R{a1}{a2}…{an} Encouraging results For |R|=12, we got |R{a1}…{an}| ≤ 108 Conjectures: The ERE monitoring rewriting algorithm runs in space O(2m) and in time O(n 2m) These are also the lower bounds for ERE membership

Conclusion and Future Work
Exponential complexity unavoidable when negation is added to regular expressions (EREs) Few rewriting rules provide the best trace membership algorithm known for EREs! We have also generated minimal DFAs using the presented algorithm plus circular coinduction Algorithm shown to work in space O(22m2) but conjectured to run in O(2m) space Claim based on experimental results Proving conjecture can have a big impact!

Grigore Rosu Mahesh Viswanathan

Similar presentations

Presentation on theme: "Grigore Rosu Mahesh Viswanathan"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Grigore Rosu Mahesh Viswanathan

Similar presentations

Presentation on theme: "Grigore Rosu Mahesh Viswanathan"— Presentation transcript:

Similar presentations

About project

Feedback