Statistical Probabilistic Model Checking

Slides:

Advertisements

Similar presentations

Probabilistic Verification of Discrete Event Systems using Acceptance Sampling Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Advertisements

Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)

Hypothesis Testing IV Chi Square.

Statistical Probabilistic Model Checking Håkan L. S. Younes Carnegie Mellon University.

Ymer: A Statistical Model Checker Håkan L. S. Younes Carnegie Mellon University.

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes Reid G. Simmons (initial work performed at HTC, Summer 2001)

Ordering and Consistent Cuts Presented By Biswanath Panda.

Point estimation, interval estimation

Evaluating Hypotheses

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes.

1 Validation and Verification of Simulation Models.

CHAPTER 6 Statistical Analysis of Experimental Data

Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Hypothesis Tests Regarding a Parameter 10.

INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.

A Framework for Planning in Continuous-time Stochastic Domains Håkan L. S. Younes Carnegie Mellon University David J. MuslinerReid G. Simmons Honeywell.

Policy Generation for Continuous-time Stochastic Domains with Concurrency Håkan L. S. YounesReid G. Simmons Carnegie Mellon University.

Planning with Concurrency in Continuous-time Stochastic Domains (thesis proposal) Håkan L. S. Younes Thesis Committee: Reid G. Simmons (chair) Geoffrey.

1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.

Planning and Verification for Stochastic Processes with Asynchronous Events Håkan L. S. Younes Carnegie Mellon University.

 1  Outline  stages and topics in simulation  generation of random variates.

1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:

Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.

State N 2.6 The M/M/1/N Queueing System: The Finite Buffer Case.

Probabilistic Verification of Discrete Event Systems using Acceptance Sampling Håkan L. S. Younes Carnegie Mellon University.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Probabilistic Verification of Discrete Event Systems Håkan L. S. Younes.

Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.

Håkan L. S. YounesDavid J. Musliner Carnegie Mellon UniversityHoneywell Laboratories Probabilistic Plan Verification through Acceptance Sampling.

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.

© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Lecture Notes and Electronic Presentations, © 2013 Dr. Kelly Significance and Sample Size Refresher Harrison W. Kelly III, Ph.D. Lecture # 3.

Inference for a Single Population Proportion (p)

Chapter Nine Hypothesis Testing.

GOVT 201: Statistics for Political Science

Lesson 8: Basic Monte Carlo integration

OPERATING SYSTEMS CS 3502 Fall 2017

Statistical probabilistic model checking

Performance Eval Slides originally from Williamson at Calgary

Chapter 7. Classification and Prediction

Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.

The Normal Distribution: Comparing Apples and Oranges

Unit 5: Hypothesis Testing

Prof. Dr. Holger Schlingloff 1,2 Dr. Esteban Pavese 1

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Distribution of the Sample Means

Copyright © Cengage Learning. All rights reserved.

Chapter 8 Inference for Proportions

Automatic Picking of First Arrivals

Hypothesis Testing for Proportions

Symbolic Implementation of the Best Transformer

CONCEPTS OF HYPOTHESIS TESTING

Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.

Discrete Event Simulation - 4

Statistical Model-Checking of “Black-Box” Probabilistic Systems VESTA

On Statistical Model Checking of Stochastic Systems

One-Way Analysis of Variance

October 6, 2011 Dr. Itamar Arel College of Engineering

Markov Decision Problems

Inferential Statistics

ECE 352 Digital System Fundamentals

Basic Concepts of Algorithm

‘Crowds’ through a PRISM

Independence and Counting

Independence and Counting

Independence and Counting

Presentation transcript:

Statistical Probabilistic Model Checking Håkan L. S. Younes Carnegie Mellon University Yesterday, Scott Smolka presented a statistical approach for LTL model checking. This is also a talk about using statistical techniques for model checking, but unlike Scott I will talk about probabilistic model checking.

Introduction Model checking for stochastic processes Stochastic discrete event systems Probabilistic time-bounded properties Model independent approach Discrete event simulation Statistical hypothesis testing With probabilistic model checking I mean that the model is a stochastic process, or more precisely a stochastic discrete event system. The properties that we will be looking at are also probabilistic. We will, in particular, look at the problem of verifying probabilistic time-bounded properties. I will present a model independent algorithm for verifying such properties for stochastic discrete event systems. The algorithm is based on discrete event simulation and statistical hypothesis testing. We make no assumptions regarding the dynamics of the system, other than that we can generate simulation traces for the system. I will tell you shortly exactly what kind of properties we can verify using our approach, but first let us look at an example system.

Example: Tandem Queuing Network arrive route depart q1 q2 q1 = 0 q2 = 0 q1 = 0 q2 = 0 q1 = 1 q2 = 0 q1 = 1 q2 = 0 q1 = 2 q2 = 0 q1 = 2 q2 = 0 q1 = 1 q2 = 1 q1 = 1 q2 = 1 q1 = 1 q2 = 0 q1 = 1 q2 = 0 Consider a simple system consisting of two serially connected queues, each of capacity two. Messages arrive at the first queue, get routed to the second queue after some time, and leave the second queue after some additional time. The time between arrivals and the time that a message remains in each of the queues are random variables. Say that we start off with both queues being empty at time zero. The hour glass represents the fact that time is a continuous quantity. Say that after some time, in this case 1.2 seconds, a message arrives at the first queue. This causes a transition to a state where the first queue contains one message and the second queue is empty. After additional time, another message may arrive at the first queue, which then contains two messages. One of the messages in the first queue then gets routed to the second queue. After another 1.6 seconds, the message leaves the second queue, leaving the system in a state at time 5.5 seconds where the first queue contains one message and the second queue is empty. This sequence of states is a path or trace for the system. The system is stochastic because the time we spend in a state is a random variable. Given a stochastic system like this tandem queuing network, we can ask the follow question: with both queues empty, is the probability less than 0.5 that both queues become full within 5 seconds? For the given path, both queues did not get full within 5 seconds, but that is just for a single path. This question represents a probabilistic model checking problem. t = 0 t = 1.2 t = 3.7 t = 3.9 t = 5.5 With both queues empty, is the probability less than 0.5 that both queues become full within 5 seconds?

Probabilistic Model Checking Given a model M, a state s, and a property , does  hold in s for M? Model: stochastic discrete event system Property: probabilistic temporal logic formula In general, a probabilistic model checking problem can be formulated as follows. Given a model M, a state s of this model, and a property phi, does phi hold in s for M. In our case, the model is a stochastic discrete event system and the property is expressed as a formula in some probabilistic temporal logic.

Continuous Stochastic Logic (CSL) State formulas Truth value is determined in a single state Path formulas Truth value is determined over a path We use CSL, the continuous stochastic logic, for expressing properties of continuous-time stochastic systems. In CSL there are state formulas and path formulas. The truth value of a state formula is determined in a single state, while the truth value of a path formula is determined over a path. An essentially analogous logic for discrete-time systems is known as PCTL. Discrete-time analogue: PCTL

State Formulas Standard logic operators: , 1  2, … Probabilistic operator: P≥ () Holds in state s iff probability is at least  that  holds over paths starting in s P< ()  P≥1– () State formulas are regular logic expression, with logic operators such as negation and conjunction. In addition, there is a probabilistic operator. Rho is a path formula, and the probabilistic formula holds in a state s if and only if the probability is at least theta that the path formula rho holds over paths starting in s. It is sufficient to consider only greater-than-or-equal-to. Strictly-less-than can be expressed using negation and greater-than-or-equal-to. Also, since we use a statistical approach, we cannot really differentiate between strictly-less-than and less-than-or-equal-to, or strictly-greater-than and greater-than-or-equal-to. In full CSL there is also a steady-state operator, but we do not have a model independent approach for dealing with it so I will not discuss it further in this talk.

Path Formulas Until: 1 U ≤T 2 Holds over path  iff 2 becomes true in some state along  before time T, and 1 is true in all prior states The primary path formula of interest is the until formula. An until formula holds over a path sigma if and only if phi two becomes true in some state along sigma before time T, and phi 1 is true in all prior states. While T can be infinite, corresponding to an unbounded until formula, this talk will only involve properties with finite time bound. We use the time-bound when generating sample paths using discrete event simulation.

CSL Example With both queues empty, is the probability less than 0.5 that both queues become full within 5 seconds? State: q1 = 0  q2 = 0 Property: P<0.5(true U ≤5 q1 = 2  q2 = 2) The question we asked about the tandem queuing network can be formulated as a CSL model checking problem. Both queues being empty corresponds to a state with q1 and q2 being zero. The property that the probability is less than 0.5 that both queues become full within 5 seconds can be expressed in CSL with a probabilistic state formula, where the path formula is an until formula with time bound 5. To answer the posed question, we check if the CSL formula holds in the given state.

Model Checking Probabilistic Time-Bounded Properties Numerical Methods Provide highly accurate results Expensive for systems with many states Statistical Methods Low memory requirements Adapt to difficulty of problem (sequential) Expensive if high accuracy is required So how do we solve problems like this? Two different approaches have been suggested in the literature: one based on the numerical computation of probabilities and one based on statistical hypothesis testing. Numerical methods are particularly well-suited when high accuracy in the verification result is required, but scale poorly with an increase in the size of the state space of the model. A statistical method, on the other hand, has very low memory requirements. It can also adapt to the difficulty of the problem at hand, and I will tell you exactly what I mean by that a little bit later. However, statistical hypothesis testing tends to be expensive if high accuracy is required, so there are pros and cons with both approaches.

Statistical Solution Method [Younes & Simmons 2002] Use discrete event simulation to generate sample paths Use acceptance sampling to verify probabilistic properties Hypothesis: P≥ () Observation: verify  over a sample path The focus of this talk is on a statistical solution method for CSL model checking, proposed by Younes and Simmons in a CAV paper from 2002. This approach takes advantage of the fact that CSL model checking is a hypothesis testing problem rather than an estimation problem. The approach uses discrete event simulation to generate sample paths, and a technique called acceptance sampling to verify probabilistic properties. The hypothesis we are testing is whether the path formula rho holds with at least probability theta. An observation in our case is the result of verifying rho over a sample path. I want to stress once again that we are not estimating the probability of rho holding! Not estimation!

Error Bounds Probability of false negative: ≤  We say that  is false when it is true Probability of false positive: ≤  We say that  is true when it is false Since we use a statistical approach, we have to accept some probability of error. However, we should at least bound the probability of error. There are two types of error: false negatives and false positives. We want the probability of a false negative to be at most alpha, and the probability of a false positive to be at most beta.

Performance of Test 1 –  Probability of accepting P≥ () as true   The performance of an acceptance sampling test can be viewed graphically as follows. Along the x-axis we have the actual probability of rho holding, and there we have our threshold theta. Along the y-axis we have the probability that we accept the hypothesis as true when using the test. There we have our error bounds alpha and beta. We do not know the actual probability of rho holding, but if it is to the right of theta then the property we are verifying is true. If it is to the left of theta, then the property is false.   Actual probability of  holding

Ideal Performance of Test False negatives False positives 1 –  Unrealistic! Probability of accepting P≥ () as true Ideally we want a test that accepts the hypothesis with high probability (at least 1-alpha) if it is really true, and accepts the hypothesis with low probability (at most beta) if it is really false. This kind of performance is plotted here, but we soon realize that it is unrealistic to expect there to be an efficient test with a performance like this.   Actual probability of  holding

Realistic Performance of Test 2 p1 p0 Indifference region False negatives False positives 1 –  Probability of accepting P≥ () as true We therefore relax the problem by introducing an indifference region of width two delta centered around our threshold. We want a high probability of accepting the hypothesis if the true probability of rho holding is to the right of the indifference region and a low probability of accepting the hypothesis if the probability is to the left of the indifference region. If the probability of rho holding is in the indifference region, we are willing to accept a high error probability. This relaxation makes it possible to construct an efficient test.   Actual probability of  holding

Sequential Acceptance Sampling [Wald 1945] True, false, or another observation? One particular way of doing acceptance sampling, is sequential acceptance sampling. This technique was developed by Wald in a truly amazing paper from 1945. If I got to take one thing with me to a deserted island, it would be this paper. A sequential test works as follows. We make observations of the truth value of rho, and after each observation we need to decide whether to accept or reject the hypothesis, or if we need to make more observations. Each observation can be viewed as the outcome of a coin flip, and we want to determine if the probability of, say, heads coming up is at least theta. We keep flipping the coin, recording the number of heads we have seen as well as the total number of coin flips, and based on this information we decide if we have enough information to accept or reject the hypothesis so that we respect the given error bounds.

Graphical Representation of Sequential Test Number of observations Number of positive observations OK, we have already seen the kind of performance we expect to get in terms of error probability from using acceptance sampling. But how do we decide whether to accept/reject the hypothesis or make more observations? The best way to explain how this works is to give a graphical representation of the sequential test. Along the x-axis we have the number of observations made so far. Along the y-axis we have the number of positive observations seen, which in our case is the number of sample paths that have satisfied the path formula rho so far. The blue area represents unreachable points, because we can never make more positive observations than the total number of observations.

Graphical Representation of Sequential Test We can find an acceptance line and a rejection line given , , , and  acceptance line Continue until line is crossed Number of observations Number of positive observations accept Verify  over sample paths continue We can find an acceptance and a rejection line given the parameters theta, delta, alpha, and beta. We then plot a curve representing the outcome of our test starting with zero samples and zero positive samples. As long as the curve stays between the two lines, we continue sampling. If the curve crosses the rejection line, we reject the hypothesis, and if it crosses the acceptance line, we accept the hypothesis. The two lines are, by the way, parallel. rejection line Start here reject

Special Case p0 = 1 and p1 = 1 – 2 “Five nines”: p1 = 1 – 10–5 Reject at first negative observation Accept at stage m if p1m ≤  Sample size at most dlog  / log p1e “Five nines”: p1 = 1 – 10–5 Now, the number of observations we need to make in order to reach a decision is rather tricky to determine in the general case, but there is a special case of some interest, which we can give a clear upper bound for. Consider the case when the upper boundary of the indifference region is 1. In that case we reject the hypothesis at the first negative observation we see and we accept the hypothesis at stage m if we have made only positive observations and p1 to the power m is at most beta. The maximum sample size is therefore the ceiling of log beta divided by log p1. So why is this an interesting case? A popular concept in reliability engineering is “five nines”. We can model this by setting p0 to one and p1 to one minus 10^-5. The table shows the maximum sample size for different values of beta. For example, if we want a probability of at most 10^-8 of accepting a system as functional if the failure probability is at least 10^-5, then we need to make at most 1.8 million observations. This may sound like a lot, but think about the accuracy we get.  Maximum sample size 10–2 460,515 10–4 921,030 10–8 1,842,059

Case Study: Tandem Queuing Network M/Cox2/1 queue sequentially composed with M/M/1 queue Each queue has capacity n State space of size O(n2) To give you an idea of how the statistical approach compares to a numerical approach, I present two case studies. First consider a variation of the tandem queuing network from before consisting of two sequentially composed queues. Each queue has capacity n, and the size of the state space is quadratic in n. 1 2 a … 1 − a  

Tandem Queuing Network (results) [Younes et al. 2004] P≥0.5(true U≤T full) 106 105 104  = 10−6  =  = 10−2  = 0.5·10−2 103 Verification time (seconds) 102 101 100 10−1 10−2 101 102 103 104 105 106 107 108 109 1010 1011 Size of state space

Tandem Queuing Network (results) [Younes et al. 2004] P≥0.5(true U≤T full) 106 105 104  = 10−6  =  = 10−2  = 0.5·10−2 103 Verification time (seconds) 102 101 100 10−1 10−2 101 102 103 104 T

Case Study: Symmetric Polling System Single server, n polling stations Stations are attended in cyclic order Each station can hold one message State space of size O(n·2n) Next consider a polling system with a single server and n polling stations. The stations are attended by the server in cyclic order, and each station can hold one message. The server starts by polling station 1. If there is a message, station 1 is served and then the server goes on to the next station. If there is no message, the server immediately starts polling the next station.     Server … Polling stations

Symmetric Polling System (results) [Younes et al. 2004] serv1  P≥0.5(true U≤T poll1) 106 105 104  = 10−6  =  = 10−2  = 0.5·10−2 103 Verification time (seconds) 102 101 100 10−1 10−2 102 104 106 108 1010 1012 1014 Size of state space

Symmetric Polling System (results) [Younes et al. 2004] serv1  P≥0.5(true U≤T poll1) 106 105 104  = 10−6  =  = 10−2  = 0.5·10−2 103 Verification time (seconds) 102 101 100 10−1 10−2 101 102 103 T

Symmetric Polling System (results) [Younes et al. 2004] serv1  P≥0.5(true U≤T poll1) 102 n = 10 T = 40 101 Verification time (seconds) ==10−10 100 ==10−8 ==10−6 ==10−4 10−1 ==10−2 (=10−6) 10−4 10−3 10−2 

Tandem Queuing Network: Distributed Sampling Use multiple machines to generate samples m1: Pentium IV 3GHz m2: Pentium III 733MHz m3: Pentium III 500MHz % samples m1 only n m1 m2 m3 time 63 70 20 10 0.46 71 29 0.50 0.58 2047 60 26 14 1.28 30 1.46 1.93 65535 65 21 26.29 67 33 33.89 44.85 Each sample can be generated independently, so we can generate samples on multiple machines in parallel to speed up the statistical solution method further. This table shows the verification times for a tandem queuing system of varying size. We have used up to three machines, each machine running at a different speed. The speedup is essentially linear in the added CPU speed.

Summary Acceptance sampling can be used to verify probabilistic properties of systems Sequential acceptance sampling adapts to the difficulty of the problem Statistical methods are easy to parallelize

Other Research Failure trace analysis Planning/Controller synthesis “failure scenario” [Younes & Simmons 2004a] Planning/Controller synthesis CSL goals [Younes & Simmons 2004a] Rewards (GSMDPs) [Younes & Simmons 2004b]

Tools Ymer Tempastic-DTP Statistical probabilistic model checking Decision theoretic planning with asynchronous events

References Wald, A. 1945. Sequential tests of statistical hypotheses. Ann. Math. Statist. 16: 117-186. Younes, H. L. S., M. Kwiatkowska, G. Norman, and D. Parker. 2004. Numerical vs. statistical probabilistic model checking: An empirical study. In Proc. TACAS-2004. Younes, H. L. S., R. G. Simmons. 2002. Probabilistic verification of discrete event systems using acceptance sampling. In Proc. CAV-2002. Younes, H. L. S., R. G. Simmons. 2004a. Policy generation for continuous-time stochastic domains with concurrency. In Proc. ICAPS-2004. Younes, H. L. S., R. G. Simmons. 2004b. Solving generalized semi-Markov decision processes using continuous phase-type distributions. In Proc. AAAI-2004.