Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strauss: A Specification Miner

Similar presentations


Presentation on theme: "Strauss: A Specification Miner"— Presentation transcript:

1 Strauss: A Specification Miner
Glenn Ammons Department of Computer Sciences University of Wisconsin-Madison Goal: Introduction: Body: Conclusion: Transition:

2 How should programs behave?
Strauss mines temporal specifications for example, always call free after malloc Why? To debug To verify To test To modify To understand Goal: Introduce the problem. Introduction: This is the question I want to answer. Body: And here’s why. Conclusion: Transition: Let’s look at some ways to answer this question. Ras: low-level comment: before talking about reasons why, say what _my_ rules are. Abrupt flow to punchline about constraining programs. Ras: Likes this. When I talk, narrow down “rules”. Ras: reorder. move “to verify” up to top. at some point, want to make analogy to structured programming or types: another constraint on programs we can write. Say what rules are now. Move “comparing miners” here. [DONE]

3 How should programs behave?
Strauss mines temporal specifications for example, always call free after malloc Why? To debug To verify To test To modify To understand Specs constrain programs like structured programming, types, etc. Goal: Introduce the problem. Introduction: This is the question I want to answer. Body: And here’s why. Conclusion: Transition: Let’s look at some ways to answer this question. Ras: low-level comment: before talking about reasons why, say what _my_ rules are. Abrupt flow to punchline about constraining programs. Ras: Likes this. When I talk, narrow down “rules”. Ras: reorder. move “to verify” up to top. at some point, want to make analogy to structured programming or types: another constraint on programs we can write. Say what rules are now. Move “comparing miners” here. [DONE]

4 Strauss output For all calls Y := accept(sock = X):
X := socket() Y := accept(sock = X) close(sock = Y) close(sock = X) read(sock = Y) write(sock = Y) start end For all calls Y := accept(sock = X): Script: This is an example of the kind of spec that Strauss mines. This is a spec for a server that uses the socket interface to accept a connection from a client. The figure is an NFA with nodes labeled by calls in the socket interface, with the values that they manipulate given abstract names. For example, the call at the start node is a call to socket; this call returns a new socket, which is called X in the specification. Each name refers to one value everywhere it occurs in the specification: so, for instance, the socket returned by socket is passed to accept, as accept’s sock argument. The spec says what a server should do: first, it should create a socket X by calling socket. X represents a port on which the server listens for connections. The server then either closes X or goes on to accept a connection. If the server accepts a connection Y, it goes on to use the connection by calling read and write. Eventually, the server must close both Y and X. This spec as two interesting features. First, as an NFA, it specifies the order in which the server should make calls: for example, it should call accept after calling socket. Second, it quantifies over values: for all sockets X and Y with a particular relationship---namely, Y was created by calling accept with X as the sock argument---the calls that reference X and Y must follow the pattern given by the NFA. For example, X must be the output of a call to socket, and it must be passed later as the sock argument of the accept call that returns Y. So, in order to learn specs like this one, we will need to pay attention to the order of calls and to the values that calls manipulate. As we’ll see on the next slide, Strauss gets both kinds of information from traces. Goal: Tell the audience what rules are, for me. Introduction: Body: Conclusion: Transition: How can we get such rules? Ras: maybe add animations showing value flow. Spec. says what programs should do: What order of calls? What values in calls?

5 Strauss inputs Interface (e.g., sockets) Client programs Library
... socket(domain = 2, type = 1, proto = 0, return = 7)) Traces 7 := socket(domain = 2, type = 1, proto = 0) Script: Strauss divides the world into three parts: client programs, a library, and an interface between them, and mines specifications by analyzing traces of the calls made by client programs as they execute. For each call, the trace records the name of the call and the values of its arguments: for example, the call shown here is a call to socket with domain argument 2, type argument 1, and proto argument 0; the call returns socket number 7. Strauss sees only the interface calls, so it works even when code for the programs or the library is not available. Also, although in this talk I refer to “client programs” or “libraries”, Strauss really applies anywhere there is an interface. For example, you could use Strauss to mine specs about an abstract data type by tracing the operations on the ADT. OK. Let’s say that I am gathering some traces of programs that use the Unix interface, which includes the socket interface. I’d like to use these traces to mine lots of different specifications about Unix, and I don’t want to have to rerun the programs every time I want to learn a new spec. So, the traces should record every Unix call and its arguments---because I don’t know which ones I’m going to need. What does the miner need to know in order to go from such traces to a specification like this one? [FLIP TO PRECEDING SLIDE] I can think of a trace as a sequence of commands to the library. The library has some internal state, and each command changes the state in some way---I don’t know how exactly, and I don’t need to know how. An interesting thing about this specification is that even though every Unix call changes the state of the Unix library, this specification only mentions socket, accept, read, write, and close calls. What’s going on is that, even though other Unix calls, like open or fork, move the library from one state to another state, the specification considers those states to be equivalent. Another interesting thing about the specification is that the calls in it have to use certain values. For example, the specification doesn’t apply to every call to close(), but only to closes for sockets that are used in the pattern in the quantifier. And, if Y is returned by some call to accept, and Y’ is returned by some other call to accept, the specification considers them independently. Intuitively, Y labels some part of the library state and Y’ labels some other part, and the two parts don’t interact. On the other hand, some sockets do interact: for example, there must be some socket that plays the role of X for Y and another socket (or maybe the same one!) that plays the role of X for Y’. So, the miner needs two kinds of information before it can learn this specification. First, it needs to know which interface calls it should ignore, and which ones it should track. And second, it needs to know how to match values in the trace to library state, and how these states interact. [FLIP BACK TO THIS SLIDE] In Strauss, the user supplies a state transition model (or STM) and a seed pattern that partially answer these questions. The state transition model lists the calls that should be tracked. For example, the STM here (which would be appropriate for learning the spec on the previous slide) tells the miner to track socket, close, accept, read, and write. Strauss assumes that each trace value labels a little bit of state in the library. In addition to just listing calls, the STM also places def and use annotations on arguments and return values. A def annotation means that the state labeled by the associated value is changed by the call; for example, the line “def socket()” means that the state labeled by the return value of socket is changed by the call to socket. And, a use annotation means that the state labeled by the associated value is used by the call; for example, the line “read(use sock)” means that read inspects the state labeled by its sock argument. So, the STM says which calls to track and also says how calls read and write the state associated with values. The final input to Strauss is a seed pattern. The seed pattern tells Strauss where to start mining. Here, Strauss should start mining by considering the values that are used and defined at calls to accept. Ras: need better text; talk a little about state, but save details for later (when I show how to produce trace programs). Strauss

6 Strauss inputs Interface (e.g., sockets) Client programs Library
State transition model (STM) def socket() close(def use sock) def accept(use sock) read(use sock) write(use sock) ... socket(domain = 2, type = 1, proto = 0, return = 7)) Traces 7 := socket(domain = 2, type = 1, proto = 0) Script: Strauss divides the world into three parts: client programs, a library, and an interface between them, and mines specifications by analyzing traces of the calls made by client programs as they execute. For each call, the trace records the name of the call and the values of its arguments: for example, the call shown here is a call to socket with domain argument 2, type argument 1, and proto argument 0; the call returns socket number 7. Strauss sees only the interface calls, so it works even when code for the programs or the library is not available. Also, although in this talk I refer to “client programs” or “libraries”, Strauss really applies anywhere there is an interface. For example, you could use Strauss to mine specs about an abstract data type by tracing the operations on the ADT. OK. Let’s say that I am gathering some traces of programs that use the Unix interface, which includes the socket interface. I’d like to use these traces to mine lots of different specifications about Unix, and I don’t want to have to rerun the programs every time I want to learn a new spec. So, the traces should record every Unix call and its arguments---because I don’t know which ones I’m going to need. What does the miner need to know in order to go from such traces to a specification like this one? [FLIP TO PRECEDING SLIDE] I can think of a trace as a sequence of commands to the library. The library has some internal state, and each command changes the state in some way---I don’t know how exactly, and I don’t need to know how. An interesting thing about this specification is that even though every Unix call changes the state of the Unix library, this specification only mentions socket, accept, read, write, and close calls. What’s going on is that, even though other Unix calls, like open or fork, move the library from one state to another state, the specification considers those states to be equivalent. Another interesting thing about the specification is that the calls in it have to use certain values. For example, the specification doesn’t apply to every call to close(), but only to closes for sockets that are used in the pattern in the quantifier. And, if Y is returned by some call to accept, and Y’ is returned by some other call to accept, the specification considers them independently. Intuitively, Y labels some part of the library state and Y’ labels some other part, and the two parts don’t interact. On the other hand, some sockets do interact: for example, there must be some socket that plays the role of X for Y and another socket (or maybe the same one!) that plays the role of X for Y’. So, the miner needs two kinds of information before it can learn this specification. First, it needs to know which interface calls it should ignore, and which ones it should track. And second, it needs to know how to match values in the trace to library state, and how these states interact. [FLIP BACK TO THIS SLIDE] In Strauss, the user supplies a state transition model (or STM) and a seed pattern that partially answer these questions. The state transition model lists the calls that should be tracked. For example, the STM here (which would be appropriate for learning the spec on the previous slide) tells the miner to track socket, close, accept, read, and write. Strauss assumes that each trace value labels a little bit of state in the library. In addition to just listing calls, the STM also places def and use annotations on arguments and return values. A def annotation means that the state labeled by the associated value is changed by the call; for example, the line “def socket()” means that the state labeled by the return value of socket is changed by the call to socket. And, a use annotation means that the state labeled by the associated value is used by the call; for example, the line “read(use sock)” means that read inspects the state labeled by its sock argument. So, the STM says which calls to track and also says how calls read and write the state associated with values. The final input to Strauss is a seed pattern. The seed pattern tells Strauss where to start mining. Here, Strauss should start mining by considering the values that are used and defined at calls to accept. Ras: need better text; talk a little about state, but save details for later (when I show how to produce trace programs). Strauss

7 Strauss inputs Interface (e.g., sockets) Client programs Library
State transition model (STM) def socket() close(def use sock) def accept(use sock) read(use sock) write(use sock) ... socket(domain = 2, type = 1, proto = 0, return = 7)) Traces 7 := socket(domain = 2, type = 1, proto = 0) Script: Strauss divides the world into three parts: client programs, a library, and an interface between them, and mines specifications by analyzing traces of the calls made by client programs as they execute. For each call, the trace records the name of the call and the values of its arguments: for example, the call shown here is a call to socket with domain argument 2, type argument 1, and proto argument 0; the call returns socket number 7. Strauss sees only the interface calls, so it works even when code for the programs or the library is not available. Also, although in this talk I refer to “client programs” or “libraries”, Strauss really applies anywhere there is an interface. For example, you could use Strauss to mine specs about an abstract data type by tracing the operations on the ADT. OK. Let’s say that I am gathering some traces of programs that use the Unix interface, which includes the socket interface. I’d like to use these traces to mine lots of different specifications about Unix, and I don’t want to have to rerun the programs every time I want to learn a new spec. So, the traces should record every Unix call and its arguments---because I don’t know which ones I’m going to need. What does the miner need to know in order to go from such traces to a specification like this one? [FLIP TO PRECEDING SLIDE] I can think of a trace as a sequence of commands to the library. The library has some internal state, and each command changes the state in some way---I don’t know how exactly, and I don’t need to know how. An interesting thing about this specification is that even though every Unix call changes the state of the Unix library, this specification only mentions socket, accept, read, write, and close calls. What’s going on is that, even though other Unix calls, like open or fork, move the library from one state to another state, the specification considers those states to be equivalent. Another interesting thing about the specification is that the calls in it have to use certain values. For example, the specification doesn’t apply to every call to close(), but only to closes for sockets that are used in the pattern in the quantifier. And, if Y is returned by some call to accept, and Y’ is returned by some other call to accept, the specification considers them independently. Intuitively, Y labels some part of the library state and Y’ labels some other part, and the two parts don’t interact. On the other hand, some sockets do interact: for example, there must be some socket that plays the role of X for Y and another socket (or maybe the same one!) that plays the role of X for Y’. So, the miner needs two kinds of information before it can learn this specification. First, it needs to know which interface calls it should ignore, and which ones it should track. And second, it needs to know how to match values in the trace to library state, and how these states interact. [FLIP BACK TO THIS SLIDE] In Strauss, the user supplies a state transition model (or STM) and a seed pattern that partially answer these questions. The state transition model lists the calls that should be tracked. For example, the STM here (which would be appropriate for learning the spec on the previous slide) tells the miner to track socket, close, accept, read, and write. Strauss assumes that each trace value labels a little bit of state in the library. In addition to just listing calls, the STM also places def and use annotations on arguments and return values. A def annotation means that the state labeled by the associated value is changed by the call; for example, the line “def socket()” means that the state labeled by the return value of socket is changed by the call to socket. And, a use annotation means that the state labeled by the associated value is used by the call; for example, the line “read(use sock)” means that read inspects the state labeled by its sock argument. So, the STM says which calls to track and also says how calls read and write the state associated with values. The final input to Strauss is a seed pattern. The seed pattern tells Strauss where to start mining. Here, Strauss should start mining by considering the values that are used and defined at calls to accept. Ras: need better text; talk a little about state, but save details for later (when I show how to produce trace programs). Seed pattern accept Strauss

8 My contributions A dynamic approach to mining
analyze communication, not source code A tool for mining specifications practical: scalable and partially automatic temporal specs may refer to multiple data values requires little info about code Heuristics to tell good code from bad code Birds of a feather flock together Found specifications and bugs two goals: summarize and predict Ras: decompose---not just prediction, but also good for summarizing. Ras: too much philosophy in first contribution. Just say that I am giving one way to analyze a program from its output, instead of from its source code.

9 Summary of results Mined 8 specs. for X11 Found 61 bugs
example: XInternAtom should only be called during initialization, not in the event loop. Found 61 bugs In widely distributed programs wish (Tk), xpdf, xpilot, ... Included serious races and performance bugs By verifying dynamically (on traces) Ras: need to say what the (race condition) protocol is. Ras: remove last two bullets. Say something about how serious these bugs are. [DONE]

10 Related work: obtaining specs.
By hand From programmers Types Specification languages and annotations From analysts, testers Abstraction tools (Bandera [Dwyer et al.]) With specification mining Rachel: PRACTICE. Ras: too boring. Highlight my contribution. Goal: Put my work in context, so that it’s clear that this is a real problem. Introduction: Here are some current solutions. Body: Discuss each in turn. Conclusion: Transition: This talk focuses on Strauss. How exactly does Strauss compare with other miners? Ras: in cites, mention principles but not necessarily names. Ras: make distinction between langages for writing specs, and processes for creating specs. [DONE]

11 Comparing spec. miners Strauss Other work
Specs: temporal, multiple objects, arbitrary NFAs Mining: from traces, local, no code necessary Goal: specs. for interfaces Other work Concurrent work FSMs for Java objects [Whaley, Martin, and Lam] Simple templates [Metacompilation] Loop invariants [ESC/Java] Earlier work Arithmetic properties [Daikon] Cliches [Programmer’s Apprentice] Communicating processes [Verisoft] Rachel: PRACTICE. Don’t spend so much time.

12 Overview of Strauss extract scenarios apply STM Classify and learn
Traces and STM Seed pattern apply STM extract scenarios Scenarios socket(...) accept(...) read(...) write(...) close(...) Trace programs (as PDGs) Classify and learn Specification X := socket() Y := accept(so = X) close(so = Y) close(so = X) read(so = Y) write(so = Y) start end .

13 Trace + STM = Trace program
State transition model (STM) def socket() close(def use sock) def accept(use sock) read(use sock) write(use sock) 1 7 := socket(domain = 2, type = 1, proto = 0) 2 0x100 := malloc(size = 23) 3 8 := accept(sock = 7, addr = 0x40, addr_len = 0x50) 4 23 := write(sock = 8, buf = 0x100, len = 23) 5 0 := close(sock = 8) ... Trace program Vars: v7, v8 1 v7 := socket() 2 skip 3 v8 := accept(sock = v7) 4 write(sock = v8) 5 v8 := close(sock = v8) ... Goal: Introduction: Body: Conclusion: Transition: Ras: say positively that we can deal with object collisions.

14 Trace + STM = Trace program
State transition model (STM) def socket() close(def use sock) def accept(use sock) read(use sock) write(use sock) 1 7 := socket(domain = 2, type = 1, proto = 0) 2 0x100 := malloc(size = 23) 3 8 := accept(sock = 7, addr = 0x40, addr_len = 0x50) 4 23 := write(sock = 8, buf = 0x100, len = 23) 5 0 := close(sock = 8) ... Trace program Vars: v7, v8 1 v7 := socket() 2 skip 3 v8 := accept(sock = v7) 4 write(sock = v8) 5 v8 := close(sock = v8) ... Goal: Introduction: Body: Conclusion: Transition:

15 Trace + STM = Trace program
State transition model (STM) def socket() close(def use sock) def accept(use sock) read(use sock) write(use sock) 1 7 := socket(domain = 2, type = 1, proto = 0) 2 0x100 := malloc(size = 23) 3 8 := accept(sock = 7, addr = 0x40, addr_len = 0x50) 4 23 := write(sock = 8, buf = 0x100, len = 23) 5 0 := close(sock = 8) ... Trace program Vars: v7, v8 1 v7 := socket() 2 skip 3 v8 := accept(sock = v7) 4 write(sock = v8) 5 v8 := close(sock = v8) ... Goal: Introduction: Body: Conclusion: Transition:

16 A different STM State transition model (STM) 1 7 := socket(domain = 2,
def malloc() free(def p) read(use buf) write(use buf) 1 7 := socket(domain = 2, type = 1, proto = 0) 2 0x100 := malloc(size = 23) 3 8 := accept(sock = 7, addr = 0x40, addr_len = 0x50) 4 23 := write(sock = 8, buf = 0x100, len = 23) 5 0 := close(sock = 8) ... Trace program Var: v0x100 1 skip 2 v0x100 := malloc() 3 skip 4 write(buf = v0x100) 5 skip ... Goal: Explain why the STM is a parameter: different STMs lead to different interpretations, and so allow different transformations, and so lead to different rules. And, can’t use one big STM because that would lead to one big, inscrutable rule. Introduction: Body: Conclusion: Transition:

17 A different STM State transition model (STM) 1 7 := socket(domain = 2,
def malloc() free(def p) read(use buf) write(use buf) 1 7 := socket(domain = 2, type = 1, proto = 0) 2 0x100 := malloc(size = 23) 3 8 := accept(sock = 7, addr = 0x40, addr_len = 0x50) 4 23 := write(sock = 8, buf = 0x100, len = 23) 5 0 := close(sock = 8) ... Trace program Var: v0x100 1 skip 2 v0x100 := malloc() 3 skip 4 write(buf = v0x100) 5 skip ... Goal: Explain why the STM is a parameter: different STMs lead to different interpretations, and so allow different transformations, and so lead to different rules. And, can’t use one big STM because that would lead to one big, inscrutable rule. Introduction: Body: Conclusion: Transition:

18 Overview of Strauss extract scenarios apply STM Classify and learn
Traces and STM Seed pattern apply STM extract scenarios Scenarios socket(...) accept(...) read(...) write(...) close(...) Trace programs (with PDGs) Classify and learn Specification X := socket() Y := accept(so = X) close(so = Y) close(so = X) read(so = Y) write(so = Y) start end .

19 Extracting scenarios from a TPDG
Pick a seed Slice forwards Slice backwards Chop each pair Ras: put edges leading in and out, to show that it is infinite. Make it “extracting scenarios from traces”. [DONE]

20 Extracting scenarios from a TPDG
Pick a seed Slice forwards Slice backwards Chop each pair

21 Extracting scenarios from a TPDG
Pick a seed Slice forwards Slice backwards Chop each pair

22 Extracting scenarios from a TPDG
Pick a seed Slice forwards Slice backwards Chop each pair

23 Extracting scenarios from a TPDG
Pick a seed Slice forwards Slice backwards Chop each pair

24 Extracting scenarios: example
Vars: v7, v8 1 v7 := socket 2 v8 := accept(sock = v7) 3 write(sock = v8) 4 v8 := close(sock = v8) 5 v7 := close(sock = v7) Goal: Introduction: Body: Conclusion: Transition: Ras: can I drop this stuff? [DONE] (decided not to drop them)

25 Extracting scenarios: example
Vars: v7, v8 1 v7 := socket 2 v8 := accept(sock = v7) 3 write(sock = v8) 4 v8 := close(sock = v8) 5 v7 := close(sock = v7) Pick a seed Goal: Introduction: Body: Conclusion: Transition:

26 Extracting scenarios: example
Vars: v7, v8 1 v7 := socket 2 v8 := accept(sock = v7) 3 write(sock = v8) 4 v8 := close(sock = v8) 5 v7 := close(sock = v7) Pick a seed Slice forwards Goal: Introduction: Body: Conclusion: Transition:

27 Extracting scenarios: example
Vars: v7, v8 1 v7 := socket 2 v8 := accept(sock = v7) 3 write(sock = v8) 4 v8 := close(sock = v8) 5 v7 := close(sock = v7) Pick a seed Slice forwards Slice backwards Goal: Introduction: Body: Conclusion: Transition:

28 Extracting scenarios: example
Vars: v7, v8 1 v7 := socket 2 v8 := accept(sock = v7) 3 write(sock = v8) 4 v8 := close(sock = v8) 5 v7 := close(sock = v7) Pick a seed Slice forwards Slice backwards Chop each pair Goal: Introduction: Body: Conclusion: Transition:

29 Overview of Strauss extract scenarios apply STM Classify and learn
Traces and STM Seed pattern apply STM extract scenarios Scenarios socket(...) accept(...) read(...) write(...) close(...) Trace programs (with PDGs) Classify and learn Specification X := socket() Y := accept(so = X) close(so = Y) close(so = X) read(so = Y) write(so = Y) start end .

30 Overview of learning standardize NFA Learn NFA Strings Scenarios
B Strings Scenarios (dep. graphs) standardize NFA Learn NFA

31 Standardizing scenarios
v7 := socket() v8 := accept(sock = v7) write(sock = v8) read(sock = v8) v8 := close(sock = v8) v7 := close(sock = v7) v2 := socket() v3 := accept(sock = v2) v2 := close(sock = v2) read(sock = v3) write(sock = v3) v3 := close(sock = v3) Goal: Introduction: Body: Conclusion: Transition: Glenn: maybe first show an abstract scenario that is not the standard scenario, and then show them that the standard scenario is lower lexicographically.

32 Standardizing scenarios
v7 := socket() v8 := accept(sock = v7) v7 := close(sock = v7) read(sock = v8) write(sock = v8) v8 := close(sock = v8) v2 := socket() v3 := accept(sock = v2) v2 := close(sock = v2) read(sock = v3) write(sock = v3) v3 := close(sock = v3) Goal: Introduction: Body: Conclusion: Transition: Glenn: maybe first show an abstract scenario that is not the standard scenario, and then show them that the standard scenario is lower lexicographically.

33 Standardizing scenarios
X := socket() Y := accept(sock = X) X := close(sock = X) read(sock = Y) write(sock = Y) Y := close(sock = Y) X := socket() Y := accept(sock = X) X:= close(sock = X) read(sock = Y) write(sock = Y) Y := close(sock = Y) Goal: Introduction: Body: Conclusion: Transition: Glenn: maybe first show an abstract scenario that is not the standard scenario, and then show them that the standard scenario is lower lexicographically.

34 Standard scenario = string
X := socket() Y := accept(sock = X) X := close(sock = X) read(sock = Y) write(sock = Y) Y := close(sock = Y) A B C D E F X := socket() Y := accept(sock = X) X:= close(sock = X) read(sock = Y) write(sock = Y) Y := close(sock = Y) Goal: Introduction: Body: Conclusion: Transition: Glenn: maybe first show an abstract scenario that is not the standard scenario, and then show them that the standard scenario is lower lexicographically.

35 Overview of learning standardize NFA Learn NFA Strings Scenarios
B Strings Scenarios (dep. graphs) standardize NFA Learn NFA

36 Raman’s PFSA algorithm
Build a weighted retrieval tree Merge similar states Goal: Introduction: Body: Conclusion: Transition:

37 Raman’s PFSA algorithm
Build a weighted retrieval tree Merge similar states socket 100 accept 95 5 read write 5 25 70 Ras: I had better say something fundamental here, about what the learner needs to do. Generalize, recover control flow. What do I expect from the learner? This is not a great example, so redo. Goal: Introduction: Body: Conclusion: Transition: write read write 70 5 25 write write read 25 70 25 5 end write

38 Raman’s PFSA algorithm
Build a weighted retrieval tree Merge similar states socket 100 accept 95 5 read write 5 25 70 Ras: I had better say something fundamental here, about what the learner needs to do. Generalize, recover control flow. What do I expect from the learner? This is not a great example, so redo. Goal: Introduction: Body: Conclusion: Transition: write read write 70 5 25 write write read 25 70 25 5 end write

39 Raman’s PFSA algorithm
Build a weighted retrieval tree Merge similar states socket 100 accept 95 5 read write 5 25 70 Goal: Introduction: Body: Conclusion: Transition: write read write 5 25 70 25 write read 100 end

40 Raman’s PFSA algorithm
Build a weighted retrieval tree Merge similar states socket 100 accept 95 5 5 read write 25 70 Goal: Introduction: Body: Conclusion: Transition: read 75 25 25 write read 100 end

41 Raman’s PFSA algorithm
Build a weighted retrieval tree Merge similar states socket 100 accept 95 25 5 5 read write 70 Goal: Introduction: Body: Conclusion: Transition: 75 25 25 write read 100 end

42 Overview of learning standardize NFA Learn NFA Scenarios (dep. graphs)
B Strings standardize socket(...) accept(...) read(...) write(...) close(...) NFA Learn NFA

43 Overview of learning standardize Classify NFA Learn NFA Learn NFA
Scenarios (dep. graphs) A C E G B Strings standardize socket(...) accept(...) read(...) write(...) close(...) Classify A C E G B Classified strings NFA Learn NFA Learn NFA Specification

44 Classifying scenarios
b0 b2 b1 g0 g1 G0 Ras: make sure I motivate this. He also wants to see how it fits into the overall picture.

45 Classifying scenarios
b0 b2 b1 g0 g1 G0 b0 b2 b1 g0 g1 G0

46 Classifying scenarios
b0 b2 b1 g0 g1 G0 g0 g1 b0 b2 b1 g0 g1 G0 G0

47 Concept analysis: mammals
4-legged hairy smart marine thumbed cats yes dogs dolphins gibbons humans whales

48 Concept analysis: mammals
cats gibbons dogs dolphins humans whales {}

49 Concept analysis: mammals
cats gibbons dogs hairy

50 Concept analysis: mammals
cats dogs 4-legged hairy

51 Concept analysis: scenarios
Takes transition 0 transition 1 Takes transition 2 Takes transition 3 Takes transition 4 Scen. 0 yes Scen. 1 Scen. 2 Scen. 3 Scen. 4 Scen. 5

52 Concept analysis: scenarios
{}

53 Concept analysis: scenarios
Takes transition 1

54 Concept analysis: scenarios
Takes transition 0 Takes transition 1

55 Where to find bugs? in programs (static verification)?
or in traces (dynamic verification)?

56 How Strauss verifies a spec
Scenarios (dep. graphs) Traces socket(...) accept(...) read(...) write(...) close(...) socket(...) accept(...) read(...) write(...) close(...) socket(...) accept(...) read(...) write(...) close(...) ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... socket(domain = 2, type = 1, proto = 0, return = 7)) ... socket(domain = 2, type = 1, proto = 0, return = 7)) extract scenarios Strings Check automaton membership A C E G B A C E G B standardize A C E G B

57 Experimental results Mined and verified 7 published X11 specs and
one new spec. (90 traces from 72 programs) Nice to know how many classes were good and how many were bad. Maybe discard states and edges, or move it, or color it differently? Goal: Introduction: Body: Conclusion: Transition:

58 Bugs found X11 selection protocol specs.:
Found 17 bugs total. English spec is buggy! XInternAtom should be called during initialization, not in the event loop 42 buggy programs (out of 72); degree varied XPutImage spec. (more on this later) 2 different bugs! But, also 2 false positives. ParseAccelTable 1 false positive Goal: Introduction: Body: Conclusion: Transition: Ras: split onto 2 slides and show the state machines! Build up to a finale! Have slide with applications!

59 Classification was useful

60 XSetSelectionOwner English: Pass XSetSelectionOwner the timestamp
from the last event. (event.time = X) := XNextEvent() XFilterEvent(event.time = X) XtDispatchEvent(event.time = X) (event.time = X) := XCheckWindowEvent() cb_XtActionProc(event.time = X) XSetSelectionOwner(time = X) For all calls XSetSelectionOwner(time = X)

61 XInternAtom English: Don’t call XInternAtom in the event loop.
cb_XtEventHandler(event.display = X) cb_XtActionProc(event.display = X) (event.display = X) := XtAppNextEvent() (event.display = X) := XNextEvent() XFilterEvent(event.display = X) XtCallActionProc(event.display = X) (event.display = X) := XWindowEvent XtDispatchEvent(event.display = X) (event.display = X) := XCheckWindowEvent() XInternAtom(display = X) For all calls XInternAtom(display = X)

62 XPutImage English: The image and GC passed to XPutImage
must have been created on the same display. Z := XCreateGC(display = X) Y := XCreateImage(display = X) Y := XShmCreateImage(display = X) XPutImage(display = X, image = Y, gc = Z) For all calls XPutImage(display = X, image = Y, gc = Z)

63 Conclusions Strauss is Strauss finds Does Strauss local scalable
dynamic Strauss finds real specs. real bugs Does Strauss generalize well? work for others? do too much or too little?

64 Suggestions for future work
Mining state transition models Beyond mining from traces on-line profile-based static Language support for example, ESC/Java and loop invariants Goal: Introduction: Body: Conclusion: Transition:

65 My publications On Strauss. Path profiling
Mining specifications. With Rastislav Bodik and James R. Larus. POPL02. Path profiling Improving data-flow analysis with path profiles. With James R. Larus. PLDI98. Selected for 20 Years of PLDI ( ). Exploiting hardware performance counters with flow and context sensitive profiling. With Tom Ball and James R. Larus. PLDI97.

66 End of talk


Download ppt "Strauss: A Specification Miner"

Similar presentations


Ads by Google