1 Monitoring Extended Regular Expressions Grigore Rosu University of Illinois at Urbana-Champaign, USA Joint work with Mahesh Viswanathan and Koushik Sen.

1 Monitoring Extended Regular Expressions Grigore Rosu University of Illinois at Urbana-Champaign, USA Joint work with Mahesh Viswanathan and Koushik Sen

2 Increasing Software Reliability Current solutions – Human review of code and testing Most used in practice Usually ad-hoc, intensive human support – (Advanced) Static analysis Often scales up False positives and negatives, annotations – (Traditional) Formal methods Model checking and theorem proving General, good confidence, do not always scale up

3 Trade-offs in System Analysis E.g., standard type checking is automatic, efficient and effective, but reveals a very limited set of errors Automation Efficiency Generality Trade - offs Efficacy

4 Runtime Verification and Monitoring Idea: Let system run and observe execution trace. If that violates or appears to violate requirements then report error or guide the program to avoid or to hit error.

5 Runtime Verification and Monitoring PathExplorer – developed jointly with Havelund – Used on 70,000 lines of C++ code (K9 Rover) – Found a deadlock in ~10 seconds – Confirmed a datarace suspicion Runtime Verification Workshop – ‘01 –France (CAV), ‘02 –Denmark (CAV), ’03 –USA (CAV) – ’04 –Spain (ETAPS), …

6 PathExplorer - Overview Running program (socket) Events Observer (Joint work with Klaus Havelund of NASA Ames)

7 PathExplorer – the Observer Predictive Analisis Specification Based Monitoring Dispatcher datarace deadlock temporal paxmodules module datarace =‘java pax.Datarace’; module deadlock =‘java pax.Deadlock’; module temporal =‘java pax.Temporal spec’; module ERE =‘java pax.Ere spec’; end Event stream warning … warning … warning … ERE warning …

8 Why (Extended) Regular Expressions? Ordinary programmers and software engineers understand and use regular expressions – Perl, Python, etc. Safety policies are often regular patterns on sequences of states/events: – (idle* open (read + write)* close)* – Complementation needed: to say what should not happen: ¬ (any* start 1 (¬ end 1 )* start 2 any*)

9 Extended Regular Expressions (ERE) Regular expressions with complement Language of an ERE Intersection R ∩ R’ := ¬(¬R + ¬R’) R ::= Φ | ε | A   | R + R | R · R | R* | ¬R L (Φ) = Φ L (R + R’) = L (R)  L (R’) L (ε) = {ε} L (R · R’) = {ww’ | w  L (R), w’  L (R’)} L (A) = {A} L (R*) = ( L (R))* L (¬R) =  * \ L (R)

10 ERE Membership Problem Given w   * and R, is it the case that w  L (R)? Patterns in strings; many applications – Programming languages (PERL, Python) – Molecular biology (Knight-Myers95) – Monitoring Efficient solutions are of great practical interest From now on, n is the length of the word/trace w and m is the size of the ERE R – n is typically much much larger than m

11 What is known (I) If R does not contain negations, then – Transform R into an NFA of size O(m) (Aho’90) Solution in time O(nm) and space O(m) Improved by Mayers’92 (JACM): time/space O(nm / log n) – Transform R into a DFA of size O(2 m ) (Aho’90) Solution in time O(nm) and space O(2 m ) Note: transitions in a DFA take logarithmic time Negations and their nesting make the membership problem highly non-trivial

12 Problems with Negation (I) How to complement an NFA? – Just complementing the set of final states is wrong! aa bb A L ( A ) = {ab} aa bb A’ L ( A’ ) = {ab,a, ε }

13 Problems with Negation (II) DFAs can be complemented safely by just complementing the set of final states, but – NFA -> DFA implies exponential state blowup! – For k nested negations, 2^(2^(…(2^m)…)) states – This makes the membership problem non-elementary more complex in the context of (nested) negations k

14 What is known (II) Dynamic programming algorithm (Hopcroft-Ullman ’79) Time O(n 3 m) and space O(n 2 m) Special synchronized alternating automata (Yamamoto ’02) – intersection but not negation (Kupferman-Zuhovitzky ’02) – general ERE Time O(n 2 m) and space O(nm+kn 2 ), where k is the number of negations and intersections Algorithms above store the word; this is unacceptable in many practical situations

15 Desired Behavior - Monitoring Running program socket Events Observer Algorithms processing and then discarding each event are desired in practice, since words or execution traces can be extremely long Algorithms processing and then discarding each event are desired in practice, since words or execution traces can be extremely long

16 Challenges and Talk Overview What is the lower space/time bound of the ERE monitoring problem (to process one event)? –  (2 cm ½ ) for space What is a reasonable upper bound for the ERE monitoring problem (to process one event)? – Rewriting algorithm in O (2 2m 2 ) space/time How to generate optimal monitors for ERE? – Optimal monitor generation by coinduction

17 Lower Bound for ERE Monitoring (I) Consider the language (Chandra-Kozen-Stockmeyer81 in alternation) (Kupferman-Vardi98 in model checking) L k = { u # w # u’ $ w | w  {0,1} k and u,u’  {0,1,#}* } We show that There is an ERE R k of size  (k 2 ) with L (R k ) = L k Any monitoring algorithm for L k needs  (2 k ) space So we can conclude that the space lower bound for ERE monitoring is  (2 cm ½ )

18 Lower Bound for ERE Monitoring (II) L k = { u # w # u’ $ w | w  {0,1} k and u,u’  {0,1,#}* } Note that size of R k is  (k 2 ) and L (R k ) = L k R k = ??? (¬$)* $ (¬$)* ∩ ??? (0+1+#)* # ??? [ (0+1) i 0 (0+1) k-i-1 # (0+1+#)* $ (0+1) i 0 (0+1) k-i-1 + (0+1) i 1 (0+1) k-i-1 # (0+1+#)* $ (0+1) i 1 (0+1) k-i-1 ] ∩ k i=0 There should be exactly one $ symbol, and … There should be some sequence of 0,1,#, followed by a # and then by a W … Each letter in W should appear after $ at exactly the same position …

19 Lower Bound for ERE Monitoring (III) L k = { u # w # u’ $ w | w  {0,1} k and u,u’  {0,1,#}* } Let A be a monitor for L k When A reads symbol $, it should “remember” exactly those w that have been seen so far There are 2 2 k possible distinct situations to remember; so at least 2 k memory needed by A to encode each of these situations

20 Idea of an Event-Consuming Algorithm “Consume” each event as it arrives, generating a new ERE monitoring requirement Use the notion of derivative – R{a} is the ERE that should hold after seeing event a, in order for R to hold now – Algorithm A stores an ERE R, and when an event a arrives it replaces R by R{a} ; at the end of trace A checks whether ε  R – How can we generate R{a} efficiently? – How can we store R{a} compactly?

21 ERE Syntax Sorts Ere and Event ; subsort Event < Ere Operations Φ : -> Ere ε : -> Ere _+_ : Ere Ere -> Ere[assoc comm id: empty] _ _ : Ere Ere -> Ere[assoc id: nil] _* : Ere -> Ere ¬ _ : Ere -> Ere

22 Derivatives Operations _{_} : Ere Event -> Ere _?_:_ : Bool Ere Ere -> Ere ε  _ : Ere -> Bool Equations (R1 + R2){a} = R1{a} + R2{a} (R1 R2){a} = R1{a} R2 + (ε  R) ? R2{a} : Φ (R * ){a} = R{a} R * ( ¬ R){a} = ¬ (R{a}) ε{a} = Φ Φ{a} = Φ b{a} = (b == a) ? ε : Φ Obvious! Related work: Antimirov and Mosses

23 Three Important Simplifying Rules Without any other rules, R{a1}{a2}…{an} can grow to unbounded size Simplifying rules Φ R = Φ R + R = R R1 R + R2 R = (R1 + R2) R Let R be the rewriting system defined so far

24 Theorems (RTA’03) R is terminating and ground Church-Rosser modulo AC of _+_ and A of _ _ L (nf AC ( R {a})) = {w | aw  L ( R )} for all EREs R a1a2…an  L ( R ) iff ε  R{a1}{a2}…{an} R{a1}{a2}…{an} requires O (2 2m 2 ) space and O (n2 2m 2 ) time, where m = |R|

25 Problems … Previous algorithm is not synchronous! – Unless we check for emptiness after processing each event, which is very expensive How to generate a minimal monitor for ERE avoiding the highly exponential state explosion? Solution: Circular Coinduction – Related work by Rutten: no negation

26 Hidden Logic Behavioral Specification Behavioral specification – Tuple (V, H, Γ, Σ, E), or simply (Γ, Σ, E) – Sorts S = V  H V = visible sorts (stay for data: integers, reals, chars, etc.) H = hidden sorts (stay for states, objects, blackboxes, etc.) – Operations Γ  Σ Σ is an S-signature Γ is a subsignature of Σ of behavioral operations – E is a set of Σ-equations

27 Contexts and Experiments Γ-context is a Γ-term with a hidden “slot” Γ-experiment is a Γ-context of visible result z : h operations in Γ visible if Γ-experiment

28 Behavioral Equivalence Models called hidden Σ-algebras; A, A’, … Behavioral equivalence on A: a ≡ a’ – Identity on visible carriers – a ≡ h a’ iff A ξ (a) = A ξ (a’) for any Γ-experiment ξ aa’ visible = A ξ (a)A ξ (a’) ΓΓ Γ

29 Behavioral Satisfaction a Σ-equation, A a hidden Σ-algebra A behaviorally satisfies, written iff θ(t) ≡ h θ(t’) for any map θ : X → A A ( X) t = h t’ A ≡ | Γ Σ A A Γ ≡ (Γ, Σ, E) | A ≡ ( X) t = h t’ | B A

30 Proving Behavioral Equivalence Behavioral satisfaction known to be π 2 hard, so – No way to automatically prove any truth – No way to automatically disprove any falsity – Hidden logics are incomplete Coinduction and context induction very strong – Both require human support Circular coinduction is an automatic procedure – Tuned and tested on hundreds of examples Streams, Protocols (ABP), Patterson’s mutual exclusion, etc. – Supported by BOBJ, prototyped in Maude 0

31 Circular Coinduction in a Nutshell “Derive” the original proof goal until end up in circles ▲ = ♥ ☺ = ☼ ♣ = ► ☺ = ☼ 5 = 5 9 = 9 0 = 0 ☺ = ☼ a m1m1 m2m2 ♣ = ► a m1m1 m2m2 a m1m1 m2m2 Modulo substitutions, “special” contexts and equational reasoning Moreover, all the behavioral equalities on the proof graph are true: lemma descovery! Moreover, all the behavioral equalities on the proof graph are true: lemma descovery! “Explanation?” (1) All possibilities to distinguish the two are exhaustively explored “Explanation?” (1) All possibilities to distinguish the two are exhaustively explored “Explanation?” (2) Any experiment can be “consumed” bottom-up, ending in a “visible” node “Explanation?” (2) Any experiment can be “consumed” bottom-up, ending in a “visible” node “Explanation?” (3) Congruent binary relation R is built; but behavioral equiv. is the largest! “Explanation?” (3) Congruent binary relation R is built; but behavioral equiv. is the largest! “Explanation?” (4) Context induction: Nodes above form “induction hypothesis” “Explanation?” (4) Context induction: Nodes above form “induction hypothesis”

32 zip(zero, one) = blink 0 = 0zip(one,zero) = t(blink) 1 = 1zip(zero,one) = blink ht h t Cobasis {h,t}

33 zip(zero, one) = blink 0 = 01 = 1zip(zero,one) = blink Cobasis {h, ht, tt} hhttt

34 zip(odd(S), even(S)) = S h(S) = h(S)zip(even(S),even(t(S))) = t(S) h(t(S)) = h(t(S))zip(even(t(S)), even(t(t(S)))) = t(t(S)) ht h t Cobasis {h,t}

35 zip(odd(S), even(S)) = S h(S) = h(S)odd(S) = odd(S) Cobasis {h, odd, even} even(S) = even(S) hoddeven One can prove by {h,t}-circular coinduction that odd(zip(S,S’)) = S even(zip(S,S’)) = S’

36 Behavioral Specification of EREs B = (V, H, Γ, Σ, E) where – V contains Event and Bool – H contains Ere – Σ contains Φ, ε, _+_, _ _, _*, ¬ _ – E contains all equations defined before – Γ contains ε  _ : Ere -> Bool _{_} : Ere Event -> Ere Theorem: B beh. satisfies R = R’ iff L ( R ) = L ( R’ )

37 (a + b)* = (a* b*)* (a + b)* = b* (a* b*)* (a + b)* = a* b* (a* b*)*(a + b)* = b* (a* b*)* true = true (a + b)* = a* b* (a* b*)* (a + b)* = b* (a* b*)* ε_ε_ _{a} _{b} (a + b)* = a* b* (a* b*)* ε_ε_ _{a} _{b} ε_ε_ _{a} _{b} Moreover, all the equivalences in the proof graph below are true! Theorem: Circular Coinduction is a decision procedure for ERE language equality Theorem: Circular Coinduction is a decision procedure for ERE language equality

38 Generating Minimal DFAs for EREs R R{a} R{b} …… R’’ …… R’{a} ab ab …… R’ …… ab equivalent? (1) Maintain a set C of pairs of equivalent EREs (2) Check each new ERE for equivlance with already existing EREs in the DFA First in C Then by CC. If equivalent ERE found, then add new circularities to C

39 Implementation BOBJ cannot be used because it does not return the set of circularities Implemented a specialized circular coinduction algorithm in Maude Web server at http://fsl.cs.uiuc.edu http://fsl.cs.uiuc.edu – A PERL CGI script which calls Maude – Generates JPEG, PS, and DOT versions of DFA

40 Conclusion and Future Work Exponential complexity unavoidable when negation is added to regular expressions (EREs) Few rewriting rules provide the best trace membership algorithm known for EREs Generation of minimal DFAs for EREs by circular coinduction (CC) avoids state explosion – To be part of PathExplorer at NASA Ames Behavioral Maude with circular coinduction Inductive/Coinductive Theorem Prover (ICTP) Behavioral Rewriting Logic

1 Monitoring Extended Regular Expressions Grigore Rosu University of Illinois at Urbana-Champaign, USA Joint work with Mahesh Viswanathan and Koushik Sen.

Similar presentations

Presentation on theme: "1 Monitoring Extended Regular Expressions Grigore Rosu University of Illinois at Urbana-Champaign, USA Joint work with Mahesh Viswanathan and Koushik Sen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Monitoring Extended Regular Expressions Grigore Rosu University of Illinois at Urbana-Champaign, USA Joint work with Mahesh Viswanathan and Koushik Sen.

Similar presentations

Presentation on theme: "1 Monitoring Extended Regular Expressions Grigore Rosu University of Illinois at Urbana-Champaign, USA Joint work with Mahesh Viswanathan and Koushik Sen."— Presentation transcript:

Similar presentations

About project

Feedback