Statistical probabilistic model checking

Statistical probabilistic model checking
Håkan L. S. Younes Carnegie Mellon University (now at Google Inc.) My name is Håkan Younes. I will be presenting work I did on probabilistic model checking between 2001 and 2006, as a graduate student and postdoc in the Computer Science Department at Carnegie Mellon University. I am now at Google here in Pittsburgh, but I am not representing Google today, and I will not talk about my current work at Google. I first gave a talk on this topic in the Fall of It was an SVC seminar, as today. Ed Clarke was there, and I am sure he still remembers how horrible of a talk it was. I hope that this will be a more pleasant experience.

Introduction Model-independent approach to probabilistic model checking Relies on simulation and statistical sampling Wrong answer possible, but can be bounded (probabilistically) Low memory requirements (can handle large/infinite models) Trivially parallelizable (distributed sampling gives linear speedup) Topics covered in this talk: Error control Hypothesis testing vs. estimation Dealing with unbounded properties/infinite trajectories The title of this talk is “statistical probabilistic model checking.” With this I mean a model-independent approach to probabilistic model checking that relies on simulation and statistical sampling. Because we use statistical sampling, it is possible that our model-checking algorithm produces a wrong answer, but we can bound the probability of that happening. We can make the probability of getting a wrong answer arbitrarily small, if we are willing to wait longer for an answer. The approach typically has very low memory requirements, and it can handle models that would make traditional methods run out of memory. It can even handle infinite models---all we need is a simulator that can generate sample trajectories for the system. Finally, the algorithm is trivially parallelizable. We can use distributed sampling to gain, practically, a linear speedup. So if you have a thousand computers—and I know at least one company where that kind of computing power is readily available---you can solve your model-checking problems a thousand times faster. I will try to keep this talk at a fairly high level. I will start by discussing error control. It is essential to understand what you can expect from a statistical solution method, and what the limitations are. I will spend some time discussing specific sampling techniques that we can use in our algorithm. There has been some confusion over this in the literature, and I hope that after this presentation you will not be fooled by some misleading results that you may come across in the literature. Finally, I will say something about how we can deal with unbounded properties and infinite trajectories in a model-independent way. Younes Statistical probabilistic model checking

Probabilistic model checking
Given a model M, a state s, and a property , does  hold in s for M ? Model: stochastic discrete-event system Property: probabilistic temporal logic formula Example: tandem queuing network arrive route depart q1 q2 Before I enter the technical part of the talk, and start discussing sampling and simulation, let me briefly mention what probabilistic model checking is. In model checking, in general, you are given a model M, a state s of that model, and a property Phi. The problem then is to determine whether Phi holds in s for model M. In probabilistic model checking, the model is some stochastic discrete-event system (for example, a Markov chain) and the property is expressed in some probabilistic temporal logic (for example, PCTL or CSL). To give you some idea of the kind of problems that we are addressing, consider the following simple example. Here we have a tandem queuing system consisting of two serially connected queues. The capacity of each queue is 2. It is assumed that the timing of arrival, routing, and departure events are governed by some probability law, so this is a stochastic system. Given this stochastic system, we can ask if the probability is at least 0.1 that both queues becomes full within 5 minutes. This is an example of a probabilistic model-checking problem. “The probability is at least 0.1 that both queues become full within 5 minutes” Younes Statistical probabilistic model checking

Probabilistic temporal logic (PCTL, CSL)
Standard logic operators:  ,   , … Probabilistic operator: ≥ [ ] Holds in state s iff probability is at least  for paths satisfying  and starting in s Bounded until:   ≤T  Holds over path  iff  becomes true along  within time T, and  is true until then Unbounded until:    Holds over path  iff  becomes true eventually along , and  is true until then As I said, properties are formulae in some probabilistic temporal logic, for example, PCTL or CSL. Such logics include standard logic formulae with negation and conjunction. In addition, there is a probabilistic operator—a “path quantifier”—that lets us express bounds on the probability of path-based properties of stochastic systems. Lowercase phi is a path formula, and the probabilistic formula holds in state s if and only if the probability measure is at least theta for the set of paths that start in s and satisfy the path formula phi. Path formulae are constructed using a time-bounded until operator. The time-bounded until formula holds over a path sigma if and only if Psi becomes true along sigma within T time units and Phi holds continuously until then. We can also use unbounded until to express long-run properties of stochastic systems. The unbounded until formula holds over a path sigma if and only if Psi eventually becomes true along sigma and Phi holds continuously until then. Younes Statistical probabilistic model checking

Property examples “The probability is at least 0.1 that both queues become full within 5 minutes” ≥0.1[  ≤5 full1  full2 ] “The probability is at most 0.05 that the second queue becomes full before the first queue” ≤0.05[ full1  full2 ] Here are a couple of examples that show you how to express properties in a logic such as CSL. First, consider the property of a queuing system from an earlier slide: “the probability is at least 0.1 that both queues become full within 5 minutes”. This property can be represented using a probabilistic operator with threshold 0.1 and an until formula with time bound 5. The second example uses unbounded until to express the property that “the probability is at most 0.05 that the second queue becomes full before the first queue.” Younes Statistical probabilistic model checking

The problem (in detail)
Before we propose a solution, we need to fully define the problem Possible outcomes of model-checking algorithm Ideal vs. realistic error control Hopefully, you have some idea of what probabilistic model checking is. Before we can present our solution based on statistical sampling and simulation, we need to understand what problem we can feasibly solve. In the next part of the talk, I will discuss possible outcomes of model-checking algorithms, and ideal vs. realistic error control. Once we have a clear definition of the requirements of our model-checking algorithm, coming up with the actual algorithm will be straightforward. Younes Statistical probabilistic model checking

Possible outcomes of model-checking algorithm
Given a state s and a formula , a model-checking algorithm A can: Accept  as true in s (s  ) Reject  as false in s (s  ) Return an undecided result (s I ) An error occurs if: A rejects  when  is true (false negative) A accepts  when  is false (false positive) Note: an undecided result is not an error, but still not desirable First, what result do we expect our model checking-algorithm to produce? Given a model-checking problem, including a state s and a formula Phi, the expected outcome of a model-checking algorithm is that the property Phi is accepted as true in state s, or that Phi is rejected as false in s. We can permit a third outcome as well, which is that the algorithm produces an undecided result. This third outcome should be interpreted as the algorithm saying that the problem is “too close to call”. To the right is the notation I will use for the three outcomes. We say that a model-checking algorithm commits an error if it rejects a true property or accepts a false property. The first error is a false negative while the second error is a false positive. Note that it is never an error to produce an undecided result. In other words, an algorithm could avoid errors by always returning an undecided result, but this is obviously undesirable. Younes Statistical probabilistic model checking

Ideal error control Bound the probability of false negatives/positives and undecided results under all circumstances Bound on false negatives:  Pr[s   | s  ]   Bound on false positives:  Pr[s   | s  ]   Bound on undecided results:  Pr[s I ]   If , , and  are all low, then model-checking algorithm A produces a correct result with high probability A reasonable model-checking algorithm should bound the probability of false negatives and false positives, while simultaneously bounding the probability of an undecided result. We denote these bounds by alpha, beta, and gamma. An algorithm that simultaneously satisfies the three constraints shown here, for alpha, beta, and gamma close to zero, is guaranteed to produce a correct answer with probability close to one. Younes Statistical probabilistic model checking

Unrealistic expectations
Ideal error control for verifying probabilistic formula ≥ [ ] in state s False negatives 1 1 –  1 –  –  s  ≥ [ ] s  ≥ [ ] Probability of accepting P≥ [ ] as true in s Undecided Consider a probabilistic statement with path formula phi and threshold theta. For any model-checking algorithm, we can plot the probability of accepting the probabilistic statement as true against the probability that the path formula holds. Along the horizontal axis, we have the threshold theta. Along the vertical axis, we have the bounds beta, alpha, and gamma. If we plot the desired probability of accepting the probabilistic statement as true for a hypothetical model-checking algorithm, we get the green curve. In particular, the probability of acceptance should be at most beta when the property does not hold, which is to the left of theta, and the probability of acceptance should be at least 1 – alpha – gamma to the right of theta. Unfortunately, it is unrealistic to expect any practical model-checking algorithm to perform in this way. It would require that we could determine the probability of phi holding with arbitrary precision, so that we would know on which side of theta we are. Next, I will discuss how current solution methods soften these expectations to become practical.  +   p  False positives Actual probability of  holding Younes Statistical probabilistic model checking

Relaxing the problem Indifference region of width 2 centered around probability thresholds Probabilistic operator: ≥ [ ] Holds in state s if probability is at least  +  for paths satisfying  and starting in s Does not hold if probability is at most  −  for paths satisfying  and starting in s “Too close to call” if probability is within  distance of  (indifference) Essentially three-valued logic, but we care only about true and false The idea is to introduce indifference regions of width two delta around probability thresholds. We modify the semantics of the probabilistic operator as follows. We say that a probabilistic statement holds in state s if the probability measure is at least theta + delta for the set of paths that start in s and satisfy the path formula phi. The statement does not hold if the probability measure is at most theta – delta. In all other cases, that is, if the probability measure is within a delta distance of the threshold theta, the statement is said to be neither true nor false. It is simply too close to call. If the original logic is CSL, then we call the logic with the modified semantics CSL delta. We will show that existing algorithms for CSL model checking are better understood as algorithms for CSL-delta model checking. Younes Statistical probabilistic model checking

Error control for relaxed problem
Option 1: bound the probability of false positives/negatives outside of the indifference region; no undecided results Bound on false negatives:  Pr[s   | s  ]   Bound on false positives:  Pr[s   | s  ]   No undecided results:  = 0 Pr[s I ] = 0 Option 2: bound the probability of undecided results outside of the indifference region; low error probability under all circumstances Bound on false negatives:  Pr[s   | s  ]   Bound on false positives:  Pr[s   | s  ]   Bound on undecided results:  Pr[s I  | (s  )  (s  )]     Armed with the relaxed semantics of CSL delta, we change the constraints for error control as follows. We change the reference point for false negatives and positives to CSL delta instead of CSL. This means that we want the probability of rejecting a property Phi as false to be at most alpha when Phi holds with some margin, and the probability of accepting Phi as true when Phi is false with some margin should be at most beta. With this relaxation, we actually do not need undecided results, so we can satisfy the third constrain with gamma set to zero.   Younes Statistical probabilistic model checking

Realistic error control—no undecided results
Error control for verifying probabilistic formula ≥ [ ] in state s False negatives 1 1 –  s  ≥ [ ] s  ≥ [ ] Probability of accepting P≥ [ ] as true in s   High error probability in indifference region Consider a probabilistic statement with path formula phi and threshold theta. For any model-checking algorithm, we can plot the probability of accepting the probabilistic statement as true against the probability that the path formula holds. Along the horizontal axis, we have the threshold theta. Along the vertical axis, we have the bounds beta, alpha, and gamma. If we plot the desired probability of accepting the probabilistic statement as true for a hypothetical model-checking algorithm, we get the green curve. In particular, the probability of acceptance should be at most beta when the property does not hold, which is to the left of theta, and the probability of acceptance should be at least 1 – alpha – gamma to the right of theta. Unfortunately, it is unrealistic to expect any practical model-checking algorithm to perform in this way. It would require that we could determine the probability of phi holding with arbitrary precision, so that we would know on which side of theta we are. Next, I will discuss how current solution methods soften these expectations to become practical.  p  −    +  False positives Actual probability of  holding Younes Statistical probabilistic model checking

Realistic error control—with undecided results
Error control for verifying probabilistic formula ≥ [ ] in state s Rejection probability Acceptance probability 1 1 –  s  ≥ [ ] s  ≥ [ ] Probability of accepting P≥ [ ] as true in s   High undecided probability in indifference region Consider a probabilistic statement with path formula phi and threshold theta. For any model-checking algorithm, we can plot the probability of accepting the probabilistic statement as true against the probability that the path formula holds. Along the horizontal axis, we have the threshold theta. Along the vertical axis, we have the bounds beta, alpha, and gamma. If we plot the desired probability of accepting the probabilistic statement as true for a hypothetical model-checking algorithm, we get the green curve. In particular, the probability of acceptance should be at most beta when the property does not hold, which is to the left of theta, and the probability of acceptance should be at least 1 – alpha – gamma to the right of theta. Unfortunately, it is unrealistic to expect any practical model-checking algorithm to perform in this way. It would require that we could determine the probability of phi holding with arbitrary precision, so that we would know on which side of theta we are. Next, I will discuss how current solution methods soften these expectations to become practical.   p  −    +  Actual probability of  holding  −  Younes Statistical probabilistic model checking

The solution Statistical sampling (hypothesis testing vs. estimation) Undecided results Avoiding infinite sample trajectories in simulation for unbounded until Younes Statistical probabilistic model checking

Verifying probabilistic properties—no undecided results Younes & Simmons (CAV’02, Information and Computation’06) Use acceptance sampling to verify ≥ [ ] in state s Test hypothesis H0: p   +  against hypothesis H1: p   –  Observation: verify  over sample trajectories generated using simulation Younes Statistical probabilistic model checking

Acceptance sampling with fixed sample size
Single sampling plan: n, c Generate n sample trajectories Accept H0: p   +  iff more than c paths satisfy  Pick n and c such that: Probability of accepting H1 when H0 holds is at most  Probability of accepting H0 when H1 holds is at most  Sequential single sampling plan: Accept H0 after m < n observations if more than c observations are positive Accept H1 after m < n observations if at most k observations are positive and k + (n – m) ≤ c Younes Statistical probabilistic model checking

Graphical representation of sequential single sampling plan
Continue until line is crossed accept c continue Number of positive observations reject We can find an acceptance and a rejection line given the parameters theta, delta, alpha, and beta. We then plot a curve representing the outcome of our test starting with zero samples and zero positive samples. As long as the curve stays between the two lines, we continue sampling. If the curve crosses the rejection line, we reject the hypothesis, and if it crosses the acceptance line, we accept the hypothesis. The two lines are, by the way, parallel. n – c n Start here Number of observations Make observations Younes Statistical probabilistic model checking

Sequential probability ratio test (SPRT) Wald (Annals of Mathematical Statistics’45) More efficient than sequential single sampling plan After m observations, k positive, compute ratio: Accept H0: p   +  if  ≤  ∕ (1 – ) Accept H1: p   –  if  ≥ (1 – ) ∕  No fixed upper bound on sample size, but much smaller on average First, consider the approach proposed by myself at CAV’02, which is based on statistical hypothesis testing. Possibly the most straightforward implementation of this approach uses what is known as a single sampling plan. This means that we pick a sample size n and an acceptance number c. We generate n sample execution paths for our stochastic system. A probabilistic statement is accepted as true if and only if more than c of these paths satisfy the path formula phi. The probability of accepting a probabilistic statement as true is given by the cumulative distribution function for the binomial distribution, as shown here. We can pick n large enough so this probability is at least 1 – alpha if p is at least theta + delta, and at most beta if p is at most theta – delta. An alternative approach is to use sequential acceptance sampling, which often results in much smaller sample sizes, but the error control is the same. Younes Statistical probabilistic model checking

Graphical representation of SPRT
Continue until line is crossed accept continue Number of positive observations reject We can find an acceptance and a rejection line given the parameters theta, delta, alpha, and beta. We then plot a curve representing the outcome of our test starting with zero samples and zero positive samples. As long as the curve stays between the two lines, we continue sampling. If the curve crosses the rejection line, we reject the hypothesis, and if it crosses the acceptance line, we accept the hypothesis. The two lines are, by the way, parallel. Start here Number of observations Make observations Younes Statistical probabilistic model checking

Statistical estimation Hérault et al. (VMCAI’04)
Estimate p with confidence interval of width 2 Accept H0: p   +  iff center of confidence interval is at least  Choosing sample size: Same as single sampling plan n, n + 1; never more efficient!    nest nopt nest ∕ nopt 0.5 10−2 26,492 13,527 1.96 10−8 95,570 39,379 2.43 78,725 1.21 0.9 4,861 5.45 13,982 6.84 28,280 3.38 Hérault et al., at VMCAI two years ago, described an alternative approach based on statistical estimation instead of hypothesis testing. Given a sample of size n, compute an estimate of p by dividing the number of sample paths satisfying a path formula by n. The appropriate sample size to satisfy the desired error bounds can be derived using a result by Hoeffding, shown here. It can be seen that the sample size is inversely proportional to delta square and proportional to the logarithm of the error bound. If the probability estimate is greater than or equal to the threshold of the probabilistic statement, then the statement is accepted as true. Hérault et al. suggested, in their VMCAI paper, that this approach has several advantages over hypothesis testing. It turns out, however, that the approach is equivalent to a single sampling plan with acceptance number n times theta But this means that hypothesis testing is always at least as efficient as statistical estimation, when used for probabilistic model checking. Typically, hypothesis testing is much more efficient, as shown in the next slide. Younes Statistical probabilistic model checking

Acceptance sampling with undecided results Younes (VMCAI’06)
Simultaneous acceptance sampling plans H0: p   against H1: p   –  H0: p   +  against H1: p   Combining the results Accept ≥ [ ] if H0 and H0 are accepted Reject ≥ [ ] if H1 and H1 are accepted Undecided result otherwise         In practice, we can accomplish the desired error control by using two simultaneous acceptance sampling plans. The first is used to test the hypothesis that p is greater than theta against the hypothesis that p is at most theta – delta. The second is used to test the hypothesis that p is at least theta + delta against the hypothesis that p is at most theta. The outcomes of the two sampling plans are combined as follows. A probabilistic statement is accepted as true if both null hypotheses are accepted. The statement is rejected as false if both alternative hypotheses are accepted. In all other cases, the outcome is an undecided result. Younes Statistical probabilistic model checking

Graphical representation of SPRT with undecided results
Continue until line is crossed accept undecided Number of positive observations reject We can find an acceptance and a rejection line given the parameters theta, delta, alpha, and beta. We then plot a curve representing the outcome of our test starting with zero samples and zero positive samples. As long as the curve stays between the two lines, we continue sampling. If the curve crosses the rejection line, we reject the hypothesis, and if it crosses the acceptance line, we accept the hypothesis. The two lines are, by the way, parallel. continue Start here Number of observations Make observations Younes Statistical probabilistic model checking

Unbounded until—avoiding infinite sample trajectories Younes (unpublished manuscript) Premature termination with probability pt after each state transition Ensures finite sample trajectories Change value of positive sample trajectory  from 1 to (1 – pt)–|| Inspired by Monte Carlo method for matrix inversion by Forsythe & Leibler (1950) Observations no longer 0 or 1: previous methods do not apply Use sequential estimation by Chow & Robbins (1965) Lower pt means fewer samples, by longer trajectories Note: Sen et al. (CAV’05) tried to handle unbounded until with termination probability, but flawed because observations are still 0 or 1 Younes Statistical probabilistic model checking

Empirical evaluation Younes Statistical probabilistic model checking

Numerical vs. statistical (tandem queuing network) Younes et al
Numerical vs. statistical (tandem queuing network) Younes et al. (TACAS’04)  P≥0.5( U ≤T full) 106 105 104 103  = 10−6  =  = 10−2  = 0.5·10−2 Verification time (seconds) 102 101 100 10−1 10−2 101 102 103 104 105 106 107 108 109 1010 1011 Size of state space Younes Statistical probabilistic model checking

Numerical vs. statistical (symmetric polling system) Younes et al
Numerical vs. statistical (symmetric polling system) Younes et al. (TACAS’04) serv1  P≥0.5( U≤T poll1) 106 105 104 103  = 10−6  =  = 10−2  = 0.5·10−2 Verification time (seconds) 102 101 100 10−1 10−2 102 104 106 108 1010 1012 1014 Size of state space Younes Statistical probabilistic model checking

Undecided results (symmetric polling system) Younes (VMCAI’06)
20 15 10 5 14 14.1 14.2 14.3 14.4 14.5  = 0  = 10–2 Formula time bound (T ) Verification time (seconds) serv1  P≥0.5[ U ≤T poll1] This graph shows the performance of the new statistical solution method, and compares it with the old solution method. We are verifying the statement shown in the box for a symmetric polling system, with varying time bound for the until formula. The property holds if the time bound is greater than and is false for lower time bounds. The light-blue box indicates the indifference region. The results shown here were obtained using sequential acceptance sampling. We can see that the alternative method requires longer verification times, and this is attributed to larger sample sizes. This should be expected because the alternative method relies on hypothesis testing with half as wide indifference regions, which results in roughly four times as large sample sizes. Younes Statistical probabilistic model checking

Undecided results (symmetric polling system)
 =  =  = 10–2 result 14.10 14.15 14.20 14.25 14.30 14.35 14.40 accept 3 9 50 88 97 100 reject 91 12 32 99 42 1 undecided 58 68 If we look more closely at the results produced with the two approaches, however, we see that with undecided results we produce fewer errors, so the higher verification time is not for nothing. At the top, we have the outcome of 100 experiments for the standard statistical solution method, without undecided results. We can see that close to where the property goes from being false to being true, more errors are committed. For the time bound 14.25, half of the runs result in an erroneous result. In contrast, the alternative solution method produces no errors at all, and instead produce mostly undecided results when the time bound is close to Younes Statistical probabilistic model checking

Thank you! Questions? Younes Statistical probabilistic model checking

Statistical probabilistic model checking

Similar presentations

Presentation on theme: "Statistical probabilistic model checking"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical probabilistic model checking

Similar presentations

Presentation on theme: "Statistical probabilistic model checking"— Presentation transcript:

Similar presentations

About project

Feedback