Lecture 18 The Run Test Outline of Today The Definition The Test Statistic The Null Distribution 11/26/2018 SA3202, Lecture 18
The Definition The Randomness Test A basic assumption for many statistical procedures is that the observations are random. The randomness test is to test if the randomness is valid Consider a sequence of n observations, each of which can be classified as either Success (S) or Failure (F). For example SSS FF SSS FFF SSSSSSS We wish to test H0: the observations are random. 11/26/2018 SA3202, Lecture 18
either Defective (F) or Non-defective (S) It is convenient to regard the observations as the result of a production process in which manufactured items emerge in sequence and each item is classified as either Defective (F) or Non-defective (S) If the process is “in control”, then the sequence of S’s and F’s would be random. otherwise, the process might cause periodic runs of defective items Key Idea The idea is to look at the groupings of S’s and F’s , and to test whether this grouping implies lack of process control (non-randomness). Run a maximal subsequence of like elements. Let R be the number of runs in a sequence. For example, SSSSS FF SSS FFF SSSSS has R=5. 11/26/2018 SA3202, Lecture 18
Main Idea Lack of randomness would be indicated by a “pattern” in the sequence of S’s and F’s, which results in either a relatively large or relatively small number of runs. For example, arrangements of 9 S’s and 9 F’s might look like these: SS FF S FF SS F SSS F S FFF R=10 SF SF SF SF SF SF SF SF SF R=18 SSSSSSSSS FFFFFFFFF R=2 11/26/2018 SA3202, Lecture 18
The maximum R= 2 n1 when n1=n2 Remark In general, we have The minimum R=2 The maximum R= 2 n1 when n1=n2 R= 2 min(n1,n2)+1 when n1 and n2 are not equal Example For a sequence of 7 S’s and 3 F’s, the maximum R=2 min(7,3)+1=7. 11/26/2018 SA3202, Lecture 18
The Null Distribution of R The null distribution of R can be obtained via elementary methods. Example Find the null distribution of R for a sequence of 4 S’s and 3 F’s. There are 35 ways to arrange 4 S’s and 3 F’s in a sequence. Under H0, all these arrangements are equally likely. Thus to find Pr(R=r) is equivalent to find the number of corresponding arrangements. e.g. For R=2, there are 2 sequences: SSSS FFF and FFF SSSS Pr(R=2)=2/35 For R=3, there are 3 sequences of form S-F-S: S FFF SSS, SS FFF SS, SSSFFFS 2 sequences of form F-S-F: F SSSS FF, FF SSSS F Thus Pr(R=3)= 11/26/2018 SA3202, Lecture 18
A Theorem Theorem Suppose that n1 S’s and n2 F’s are arranged at random, and let R denote the number of runs. Then Pr(R=2k+1)= Pr(R=2k)= 11/26/2018 SA3202, Lecture 18
For a sequence of n1=4 S’s and n2=3 F’s, we have Pr(R=2)= Example For a sequence of n1=4 S’s and n2=3 F’s, we have Pr(R=2)= Pr(R=3)= Pr(R=4)= 11/26/2018 SA3202, Lecture 18
Large Sample Theory For large n1 and n2, the distribution of R is asymptotically normal with E(R)=2n1n2/(n1+n2)+1, Var(R)=2n1n2(2n1n2-n1-n2)/{(n1+n2)^2 (n1+n2-1)} 11/26/2018 SA3202, Lecture 18
T FF T F T F TT F T FF T F T F TT F Example A true-false examination was constructed with the answers running in the following sequence: T FF T F T F TT F T FF T F T F TT F We wish to test if this sequence indicates a departure from randomness. n1=10, n2=10. A test of approximate size for 5% is to reject H0 if R<=6 or R>=16. The observed R=16. We conclude that there is some evidence to indicate non-randomness. 11/26/2018 SA3202, Lecture 18
Remarks Remark 1 The runs test may be used to indicate non-randomness in a sequence of numerical observations X1 X2 … Xn, taken over time. The key idea is to classify each observation as either “High” or “Low” compared with some reference or standard value, and then apply the runs test. Remark 2 The runs test may also be used to test the difference between two populations (as a simple alternative to the Mann-Whitney test). The key idea is to arrange all the observations in order of magnitude, and then identify each observation as either from the 1st sample (X) or the 2nd sample (Y) ; thus obtaining a sequence of X’s and Y’s. A difference between the two populations would then be indicated by a small number of runs. So we reject H0 when R is too small. 11/26/2018 SA3202, Lecture 18