Monitoring Bernoulli Processes William H. Woodall Virginia Tech

2 Outline  Introduction to Bernoulli processes  Geometric control chart  Effect of estimation error on geometric control chart  Bernoulli and geometric CUSUM charts  Comparing methods using steady state performance  Effect of estimation error on the CUSUM charts  Generalizations  Conclusions 2

3 Overview  Consider monitoring attribute data where each item is classified as “nonconforming” or “conforming.” This shows up in industrial and in health–related applications.  We may have 100% inspection of items.  This data stream is often modeled as a sequence of independent Bernoulli random variables X 1, X 2, X 3, …, with P(i th item is nonconforming) = P(X i = 1) = p. 3

4 Monitoring Bernoulli Processes  When the nonconforming rate, p, is small, traditional methods such as the Shewhart p–chart are inadequate.  It is inefficient to artificially group items into samples of size n when data are available successively on individual items.  Many methods have been proposed, including the geometric control chart.  The Bernoulli or equivalent geometric CUSUM charts have the best performance in detecting a sustained shift in p. 4

5 5

6 Note that points plotted below the LCL of the geometric chart indicate process deterioration, whereas points above the UCL indicate process improvement. Generally p 0 will be unknown and must be estimated using a Phase I sample. Three approaches have been proposed – binomial sampling, negative binomial sampling, and the use of a Bayes estimator. 6

7 7

8 8

9 9

10 10

11 Performance Metrics  Charts should be compared based on the average number of observations until a signal occurs, the ANOS. Charts are designed so that the in-control ANOS, ANOS 0, is a specified value.  Equivalently we can consider the average run length (ARL), the average number of points plotted until a signal occurs, since ANOS = ARL/p.  When p 0 is estimated, the actual in-control ANOS value becomes a random variable. We would like the average value to be the specified ANOS 0, but we also need the standard deviation of the in- control ANOS to be low enough that practitioners are confident in getting the desired value.  Previous work on the effect of estimation on the geometric chart has considered only the average in-control ANOS value, not the standard deviation. 11

12 12 Figure 1: a) the In-Control ARL avg and b) the In-Control SDARL. Desired ARL 0 is 370.4.

13 The in-control ARL avg value converges relatively quickly to the desired value, but the in-control SDARL converges relatively slowly to zero. This means that for the geometric chart to have reliable and predictable in-control performance, the Phase I sample size must be quite large. The required sample size can be an order of magnitude higher than previously recognized. 13

14 Performance with sequential sampling Performance with sequential sampling

15 15 Geometric and Bernoulli CUSUM Charts The upper sided geometric CUSUM statistics are S i = max(0, S i-1 – Y i + k G ), i = 1, 2, 3,…, S i = max(0, S i-1 – Y i + k G ), i = 1, 2, 3,…, where S 0 = 0, Y i is the i th geometric count and k G is determined based on a likelihood ratio to detect a shift from p 0 to p 1 = δ p 0. A signal is given when S i > h G.

16 16 The upper sided Bernoulli CUSUM statistics are The upper sided Bernoulli CUSUM statistics are B i = max(0, B i-1 + X i - k B ), i = 1, 2, 3,…, B i = max(0, B i-1 + X i - k B ), i = 1, 2, 3,…, where B 0 = 0, X i is the i th Bernoulli observation and k B = 1/ k G is determined based on a likelihood ratio to detect a shift from p 0 to p 1. A signal is given when B i > h B. The geometric and Bernoulli CUSUM charts are equivalent if B 0 = 1 – k B and h B = (h G + k B – 1) / k G.

17 17 h ALARM: CUSUM ≥ 2.7 Moves up when a malformation occurs. Moves down (or stays at zero) for a normal birth. B B

18 Performance Metrics  Charts should be compared based on the average number of observations until a signal occurs, the ANOS.  Steady–state random–shift models should be used when comparing methods. Under this model a shift in p occurs after the process monitoring has been underway and the shift in p may occur at any time. 18

19 Differences in Steady-State Models Fixed–shift model. Fixed–shift model. Random–shift model. Random–shift model. 19

20 Misconceptions about the Geometric CUSUM Chart  For a zero–state analysis, a natural headstart feature is present for the geometric CUSUM chart.  For a steady–state analysis, the geometric CUSUM chart is considered better than the Bernoulli CUSUM chart in some cases, but only because the fixed–shift model is used. 20

21 Fixed vs. Random–Shift Model Conclusions are much different based on the type of model used for the geometric CUSUM chart. 21 pGeometric Fixed– Shift Geometric Random– Shift Bernoulli Reynolds & Stoumbos (2000).01029202 29249.015258227522753.020804895897.025424489.030282330.040166199

22 The effects of parameter estimation are more significant when the tuned shift size δ is smaller, when the desired ANOS 0 value is larger, and when the p 0 value is smaller. (a) (b)

23 23 Geometric Chart with LCL Bernoulli CUSUM δ=3 m \ p 0 0.010.001 0.0005 0.001 10,000101,000110,001 120,011 278,470 20,09567,725 101,906 722,294 20,000100,500105,000 110,002 168,120 14,15646,259 67,575 235,629 50,000100,200102,000 104,000 123,231 8,93328,629 40,986 91,269 100,000100,100101,000 102,000 111,011 6,31220,095 28,562 54,645 200,000100,050100,500 101,000 105,371 4,46214,156 20,047 35,535 500,000100,020100,200 100,400 102,118 2,8218,933 12,622 21,380 1,000,000100,010100,100 100,200 101,055 1,9956,312 8,912 14,864 2,000,000100,005100,050 100,100 100,523 1,4104,461 6,297 10,421 5,000,000 100,002100,020 100,040 100,209 8922,821 3,981 6,561 ∞ 100,000 100,055 00 0 0 TABLE 1. Values of ANOS avg (blue) and SDANOS (black)

24 Two Generalizations of Bernoulli CUSUM Chart  Steiner et al. (2000) used a logistic regression model to let p 0 vary from item to item. This is widely used in the risk-adjusted monitoring of surgical outcomes where the health characteristics of patients can vary widely.  Ryan et al. (2011) extended the approach to the case where there are more than two outcomes, e.g., manufactured items are classified as good, fair, or bad. 24

25 25

26 26 Example of a two-sided risk-adjusted CUSUM chart (provided by Stefan H. Steiner)

27 Conclusions  It is important to consider the variation in performance as well as the expected in-control performance of control charts when parameters are estimated.  The Phase I sample sizes required for reliable in-control performance of geometric control charts can be impractically large.  The steady-state performance of methods for monitoring with Bernoulli data should be evaluated using the random–shift model.  Using a fixed–shift model has led to conclusions that the geometric CUSUM chart is better than the Bernoulli CUSUM chart for detecting an increase in p when the methods can be designed to be equivalent. 27

28 Conclusions (continued)  The Bernoulli CUSUM chart is much more adversely affected by estimation error than the geometric control chart, requiring much larger Phase I sample sizes.  Because required sample sizes can be too large to be practical, the method of Steiner and MacKay (2004) is highly recommended for identifying continuous product or process variables to monitor in place of the attribute approach. This can lead to more information, much smaller Phase I sample sizes, and greater ability to detect process changes and to improve the process. 28

